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I. Introduction 

A. Model Setup 

Consider the n-dimensional complex-valued observation model: 

y = AUXb + z (1) 
= AUv + z (2) 

where: 

• X = diag(x), and x is an iid complex Gaussian n-vector with components Xi ~ CJ\f{0,Vx)', 

• b is an iid n-vector with components bi ~ Bernoulli-g, i.e., P[6j = I] = q = 1 — F[bi = 0]; 

• V = Xb is a Bernoulli-Gaussian vector, with components Vi = Xihi; 

• A is an n X n diagonal matrix with iid diagonal elements [AJj^j ~ BernouUi-p, i.e., P[[A]j^i = 1] = 
p = l-P[[A],,i = 0]; 

• U is an n X n random matrix such thaQ 

R = U+A+AU (3) 



is free from any deterministic Hermitian matrix (see 11381 and references therein). 

• z is an iid complex Gaussian n-vector with components Zi ~ CA/'(0, 1); 

• A, U, X, b and z are mutually independent. 

• The signal-to-noise ratio (SNR) of the observation model ([!]) is defined as 

The non-zero elements of b define the support of the BernoulU-Gaussian vector v, whose "sparsity" 
(average fraction of non-zero elements) equal to q. The non-zero diagonal elements of A define the 
components of the product Uv for which a noisy measurement is acquired. In the literature, the number 
of non-zero diagonal elements of A is commonly referred to as the number of measurements. The 
"sampling rate" (average fraction of observed components) of the observation model ([1]) is equal to p. 
The sensing matrix AU is known to the signal processor, the goal of which is to detect the support of 
V, i.e., to find the position of the non-zero components of b. 

In this paper we are interested in the optimal performance of the recovery of the sparse signal support. 
Denoting the recovered support by b = . . . , bnY , with bi G {0, 1}, the objective is to minimize the 

'Superscript ^ indicates Hermitian transpose. 
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support recovery error rate: 

1 " 

D^'-\p,q,V,) = -y^nbi^bil (5) 

i=l 

where the expectation is with respect to A, U, X, b, and z. In particular, this works focuses on the large 
n regime 

D{p,q,V^) = lim D^^\p,q,V^) (6) 

71— >00 

under the optimal Maximum A Posteriori Symbol-By-Symbol (MAP-SB S) estimator, as well as under 
some popular suboptimal but practically implementable estimation algorithms. 

B. Existing results 

Recovery of the sparsity pattern with vanishing error probability is studied in a number of recent works 
such as m, HI, mi, 1271, (391, lIlQl- When k = Yh^^ h, the number of nonzero coefficients in v, is 
known beforehancj^ and their magnitude is bounded away from zero, exact support recovery requires that 
the number of measurements grow as klogn 041, BOl . If the support recovery error rate is allowed 
to be non- vanishing, fewer measurements are necessary. Under various assumptions. Hi! @> ll29l show 
that a number of measurements growing proportionally to k log ^ suffices. A more refined analysis is 
given by Reeves and Gastpar in ll29l . ll30l . IIBTI . |[32l . assuming that the entries of the measurement 
matrix are iid but without requiring the signal vector x to be Gaussian. They find tight bounds on the 
behavior of the proportionality constant as a function of SNR and the target support recovery error rate. In 
particular, 1311 upper bounds the required difference p—q when using an ML estimator of the support. The 
comparison given in ||3T1 . |[32l of computationally efficient algorithms such as linear MMSE estimation 
and Approximate Message Passing (AMP) to information theoretic bounds reveals that the suboptimality 
of those algorithms increases with SNR. In contrast to Q, |[32l considers a distortion measure which is 
the maximum of the false-alarm and missed detection probability. 

The recent work (31] gives results for iid Gaussian measurement matrices, based on the analysis of a 
message passing algorithm rather than the replica method. A full rigorization of the decoupling principle 
introduced in |18| has been recently announced in |8| for compressive sensing applications with iid 
measurement matrices. Another rigorous justification of previous replica-based results is given in [43 1 
which shows that iid Gaussian sensing matrices incur no penalty on the phase transition threshold with 
respect to an optimal nonlinear encoding. 

^Note that in our model, the number of nonzero coefficients is not known a priori but ^ — >■ g. 
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It is of considerable interest to explore the degree of improvement afforded by dropping the assumption 
that the measurement matrix has iid coefficients. Randomly sampled Discrete Fourier Transform (DFT) 
matrices (where rows/columns are deleted independently) e.g. ll37l are one example of such matrices. 



The model considered in Section I-A allows a relevant generalization of the iid measurement model, 
which is analytically tractable. 

C. Organization 

Section [11] gives expressions for the input-output mutual information rate, and shows how to use it in 
order to lower bound the support recovery error rate. We write the mutual information of interest as the 
difference of two mutual information rates. The first term is obtained using the heuristic replica-method, 
previously applied in various problems involving iid matrices, e.g. |[T8l . |[35l . ||28]| . |[T5]| . The second term 
is given rigorously, using free probability and large random matrix theory. 

Upper and lower bounds on the input-output mutual information corroborating the replica analysis are 
developed in Section III We also give a converse result that shows that Q is bounded away for zero if 



p < q. Numerical examples illustrate the tightness of the bounds. 

Section IV extends the decoupling principle ifTSl to the model in Q and provides the analysis of three 



support estimators: optimal MAP-SBS, thresholded linear MMSE and li relaxation (Lasso). 
Proofs and other technical details are given in the Appendices. 

II. Mutual information rate 
In this section we are concerned with the mutual information rate 

Z = lim -I(b;y|A,U) =2:1 -X2 (7) 

n— >-oo n 

where 

11 = lim -/(v;y|A,U) (8) 

n— ^-oo n 

12 = lim -/(x;y|A,U,b). (9) 

n— >oo n 

and the right-most equality in (|7]) follows from 

/(b;y|A,U) = /(x,b;y|A,U)-/(x;y|A,U,b) (10) 
= /(Xb;y|A,U)-/(x;y|A,U,b) (11) 
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A. Error rate lower bound via mutual information 

We can bound the minimal support recovery error rate D{p, q, Vx) defined in Q in terms of X using 
the following simple result. 

Theorem 1 Given a joint distribution Pxv on X x y, a reconstruction alphabet X and a distortion 
measure d : x Af i— )• [0, oo), let 

R{d) = inf I{X;X) (12) 

Then 

i?(inf E[d(X, X)]) < I{X; Y) (13) 
where the infimum is over all conditional probability assignments -P^jy such that Pxyx ~ PxPy\xPx\y- 
Proof: See Appendix [A] ■ 



Since R{d) is a monotonically decreasing function, (13 1 gives an information theoretic lower bound 
on the non- information-theoretic quantity inf E[d(X, X)]. In our case, using the rate-distortion function 
of a BernouUi-g source with Hamming distortion, given by R{d) = max{/i(g) — h{d),Q}, Theorem [T] 
results in 

D{p,q,Vx)>h-\h{q)-X) (14) 

where h{x) = a; log ^ + (1 — x) log x G [0, 1] denotes the binary entropy function, and where we 
assume q <\ (notice that X < h{q) by definition (m)). 



B. Mutual information rate X\ via replica method 

For any (X, y) ~ PxY, we denote the minimum mean-square error for estimating X from Y as 

mmse(X|y) = E[|X -E[X|y]p]. (15) 

With this definition, we have the following claim dependent on the validity of the replica method: 

Claim 1 Let Bq,Xo, Z be independent random variables, with Bq ~ Bernoulli-q, Xq ~ CA^(0, Vx), and 
Z ~ CJ\f{0, 1), and define Vq = XqBq. Let TZ-r{-) denote the R-transform of the random matrix R 
defined in Q. Then, 

{n^{-w)-r,)dw log e, (16) 
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where rj and x <^^^ the non-negative solutions of the system of equations: 

X = mmse (Vo\Vo + r]~2 . 



(17a) 
(17b) 



If the solution of {17a) - (17b) is not unique, then we select the solution that minimizes X\ given in [16), 
which corresponds to the "free energy" {up to an irrelevant additive constant) of a physical system with 
"quenched disorder parameters" y, A,U, "state" v ~ Pv(v) and unnormalized Boltzman distribution 
Py|v,A,u(y|v, A,U)pv(v), where 



Py|v,A,u(y|v, A,U) 



1 



exp 



|y-AUvf) 



(18) 



is the conditional transition probability density of the observation model ([7]), given A, U. 



Proof: See Appendix [B| 

The efficient calculation of I [Vq;Vq + t]^ ^ Z ) and of mmse ( VqI^o + ??^^) is addressed in Ap- 
pendix |h] 



C. Mutual information rate Xi via freeness 

Theorem 2 Let Vr(-) and rfR{-) denote the Shannon transform and rj-transform (see 
in Appendix [C]) o/ R defined in Q. Then, 

T2 = Vr(q V^) + q log (1 + uV^) - log(l + Q uV^) 

where a and v are the unique non-negative solutions of the system of equations 

1 1 



and definitions 



(19) 



+ l-q 



(20) 



Proof: See Appendix |C] 



D. Special Cases 

1) \J is an iid random matrix: Assuming U has iid entries with mean zero and variance ^, according 
to ||38l Theorem 2.39] the ?7-transform of R satisfies the relation 

1 - r^Yiix) 



1 - r/T(a;ryR(x)) 

with T = A^^A. Using the fact that A is diagonal with Bemoulli-p iid diagonal elements, 

P 



r]T{x) = r}A{x) = l-p + 



l + x 



(21) 



(22) 
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Using this in (21 1, we have that ??r(x) is the positive solution of the quadratic equation 

xrf — ((1 — p)x — 1)7] —1 = 0, 



(23) 



which corresponds to the ry-transform of a random matrix of the form HH^^, with H of dimension nxpn 
and lid elements with zero mean and variance 1/n. The R-transform of such matrix is well-known (see 
Example 2.27]) and takes on the form 

P 



1 



Hence, the fixed point equations ( 17a i - ( 17b) reduce to 

1 



1 



rj p 



1 + mmse Vb|Vb + ?7 



and ( [T6l ) takes on the form 

Ii = l(Vo;Vo + v-'^Z) +p[log 



+ 



1 loge 



(24) 



(25) 



(26) 



This is obtained from (16 1 using (24) for the R-transform and the identity ^ = ^ (1 + x)» from (17al. 
We notice that when p = 1 (26l coincides with the result in |[T8l . The formula provided by Claim [1] does 
not coincide with the result in ifTSl . ll28l for general p since in the model considered by ifTSl . ll28l the 
"channel matrix" AU is normalized such that the columns (and not the non-zero rows, as in our setting) 
have unit average squared norm conditioned on A. Instead, our formulas are consistent with those in 
lISTI . which uses the same row-energy normalization as in this paper. 



In order to calculate I2, we use (20 1 and obtain 



1 



1 



1 



Using the definition of S -transform (see Definition |3] In Appendix [C|), we have that 

1 



aVx = ^nimiaVx) - 1) 
from which, identifying terms, we obtain 



r] - 1+p, 



1 



(27) 



(28) 



(29) 



Sr (7?-1) 

where for simplicity we let 77 = rf^{aVx) and where the rightmost equality follows from the well- 
known explicit expression = valid when U is an iid matrix. Replacing (29) in the equality 



r/ = ij^^-p + 1 — g in (20 1, we obtain 

T] = 



l + {ri-l+p)V, 



+ l-q. 



(30) 
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Defining Q = v /p can rewrite (30 1 as 



1 



+ 



(31) 



Hence, Q is seen to satisfy a well-known fixed-point equation yielding Q = ^gjjt where H is a 

pn X qn matrix with iid with variance l/{pn) (see |[38l Eq. (2.120)]). Using |[38l Eq. (2.121)], Q can be 
obtained in closed form as 



1 



4.Vx 



where 



J^{x, y) = l^^x{l + ^Y + l - ^x(l-^)2 + l^ 



(32) 



(33) 



and the corresponding Shannon transform yields the desired Z2, in the form 

1 



q\og l+pVx - -,T pVx 



plog [ 1 + qVx - ( pVx, 



In passing, we remark that the "large SNR" (i.e., large Vx) behavior of (34) is 

I2 = mm{p, q} log(l + \p - q\Vx) + 0(1) 



(34) 



(35) 



showing that the pre-log of X2 is the asymptotic almost sure normalized rank of the matrix AUdiag(b), 
as expected. 

2) U is Haar-distributed: If U is Haar-distributed, i.e., uniformly distributed on the manifold of 
n X n unitary matrices, the eigenvalue distribution of R coincides with that of AA^^ = A, i.e., with the 



Bernoulli -p distribution. Using (22i and the relation between the r/-transform and the R-transform in 
Eq. 2.74], we obtain 

Z-1 + ^/{z - 1)2 + Azp 



n-R{z) = TZAiz) 



2z 



(36) 



This allows for the calculation of ( [161 ) with the corresponding fixed point equations (17ai and (17b). 
As far as I2 is concerned, we use 



VnioiVx) = riA{aVx 



P 

l + aVa 



+ 1-P 



in ( 20 1 and solve for a using the first equality, obtaining 

p — V 
"~ uVx{l-p)' 



(37) 
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Replacing in the second equality in ( pOj ), we obtain explicitly v as 

Vxip - g) - 1 + ^{Vx[v -q)- ly + ^pVxjl - q) 
2V,{l-q) 



(38) 



It can be checked that < < p for any Vx > and p, q in [0, 1]. Using (37 1 and (38 1 ( 19 1, we obtain 

l2=qlogil + i^Vx)+d{p\\i^) (39) 

where 

d{a\\b) = alog ^ + (1 - a) log ^ (40) 



is the binary relative entropy. The expression ( 39 1 coincides with the result given in Il37ll for the limit of 
the mutual information rate 



-I(x; AUBx + z| A, U, B) = -E 

n n 



log 



I + T'xU'f AUB 



(41) 



of a vector Gaussian channel with iid Gaussian input x, and channel matrix AUB with B = diag(b). 



3) A = I, unitary U; In this case, R = UtA+AU = I and 7^r(z) = 1. Hence, ( [T7a| ) and ( |l7b| 
become 



rj 
X 



1 



Vx 



Since A = I implies p = 1, ( [38] ) yields 1^ = 1 and (recalling ([39]l), we have 

I = I{Vo;Vo + Z)-qlog{l + Vx) 

= I{Vo;Vo + Z)-I{Vo;Vo + Z\Bo) 

= I{Bo;Vo + Z) 

= h{q)-H{Bo\Vo + Z), 



(42) 
(43) 

(44) 
(45) 
(46) 
(47) 



where ( [461 ) follows because Vq + Z and Bq are independent conditioned on Vq. In fact, in this case, the 
single-letter expression I = -I{h; y| A = I, U) holds for all n, not only in the limit of n — >• oo. 



III. Bounds on the Mutual Information Rate 

A. Upper Bounds 

We start with the following result, which follows immediately from first principles. 
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Theorem 3 IfXJis unitary, then Q satisfies 

T<I{Vo;Vo + Z)-q\og{l+V.), (48) 
where Z and Vq are as defined in Claim^ Equation (48 1 holds with equality for A = I. 



Proof: It is sufficient to notice that the output y in ([T]) is obtained by sampling the vector UXb+z at 
the positions of the "1" elements of the diagonal of A. From the data processing inequality and noticing 
that ^/(b; UXb + z) is given by (|44]), the result follows. ■ 
In the general case, we have the following upper bounds 



Theorem 4 



< Vniqn,) (49) 
Ii<l(Vo; V^imVo + z) (50) 



where Z and Vq are as defined in Claim [7] and where |Rp is a random variable distributed as the 
limiting spectrum of R. 

Proof: See Appendix |D] ■ 
B. Lower Bounds 

In order to corroborate the exact result of Claim [1] obtained through the heuristic replica method, we 
also consider a lower bound to the mutual information. Since Z2 is known exactly, it is sufficient to have 
a lower bound for Ti. This is provided by the following result: 

Theorem 5 The mutual information rate in is lower bounded by 

:ii > / (v^o; V^im^Vo + z) d(3, (5i) 

where Z and Vq are as defined in Claim^and where ri{s; (3) is defined by 



n— ^-oo 

where i = [n(3\. 



v(s;/3)= lim ujAf I + sAU,_iUt , A"f 

n— >-oo 



Au, (52) 



Proof: See Appendix ID] 



It is interesting to notice that the quantity defined in (52) can be interpreted as the asymptotic (in n) 



multiuser efficiency of a CDMA system r = AUv + z with input v, output r and spreading codes given 
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by the columns of AU, where the receiver uses Unear MMSE detection with successive decoding, and 
the input symbols Uj+i, ■ ■ ■ ,Vn have been already decoded and subtracted from the received signal (see 



11381 . II331 ). Hence, the integral in (51 1 can be regarded as the mutual information between the input v 
and the output of a mismatched successive interference cancellation receiver that treats the symbols of 
V as if they were Gaussian iid, instead of Bernoulli-Gaussian. 

Explicit expressions for r/(s; /3) can be provided in several cases of interest. For example, when U 
has iid entries, using ll38l Theorem 2.52] we obtain r]{s; /3) = r], given by the solution of the fixed-point 
equation 



namely. 



^ = TTJ^^ (53) 



In the case of Haar-distributed U, using ll38l Eq. 3.112] we obtain r/(s; /3) = rj, given by the solution 
of the fixed-point equation 

77 p 



(55) 



1 + sri 1 + /3s + (1-/3)57?' 
namely, 

, (p - /3)s - 1 + - P)s - 1)2 + 4(1 - P)ps 

Using the mean- value theorem in (51 1, there exists some /3* G [0, 1] such that 

I (vb; ^/r]{qV^;P) + z) dp = I (1/0; VviqV,;P*) Vo + (57) 



which is in the same form as the upper bound ( 50 1 save for a different signal-to-noise ratio between the 
Bernoulli-Gaussian input and the Gaussian noise. 

It is also immediate to notice that the upper and lower bounds on li hold for any fixed deterministic 
U, provided that the limits exist. For example, in the case of U = F, a deterministic unitary DFT matrix. 



II371 shows that ri{s; /3) takes on the same form (56 1 as well as the exact expression for I2 is still given 
by Theorem [2] Hence, it follows that while at the moment we can develop the replica analysis only for 
U random, satisfying the freeness requirement as said above, the mutual information for a deterministic 



DFT matrix satisfies the same bounds. In fact, we have numerical evidence (see Section |IV-F[ ) that leads 
us to conjecture that the replica result of Claim [T] applies also to a DFT sensing matrix, although the 
proofs of this paper do not extend to this case. 
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C. High-SNR Regime 

Theorem 6 For the observation model ([7]) and any support estimator, D{p, q, Vx) is bounded away from 
zero for < p < q, even in the noiseless case. 



Proof: From (14i it is evident that D{p,q,Vx) is bounded away from zero if X < h{q). From 



the definition of the mutual information rate I (see (|7])), it is immediate that X < h{q) for any finite 
Vx- However, in the Umit of high SNR, X may or may not converge to h{q) depending on the system 
parameters p and q. In the remainder of the proof we show that 

lim X < h{q) (58) 

P^— s>oo 

provided < p < q. The case p = is trivial. 
Recall from Theorem |2] that 

VRiaVx) = (59) 

+ l-q (60) 



1 + uVx 

M'Px) = Vn{arx) + qlog{l + uVx)-log{l + auVx) (61) 

where we have made explicit the dependence of X2 on Vx- For the purposes of the proof it is important 
to elucidate the behavior of aVx, ^Vx, and avVx as "P^ — )• 00, where v and a depend on Vx through 



( |60| ). In principle, there are nine possibilities: 

1) aVx and vVx 0. 

2) aVx — ^ and < lim-p^_^oo i^'Px < 00. 

3) aVx —J" and uVx diverges. 

4) < lim-p^,_i.oo cuVx < 00 and uVx — ^ 0. 

5) < lim-p^^oo OiVx < 00 and < limp^_j.oo i^'Px < 00. 

6) < lim-p^,_i>oo OiVx < 00 and uVx diverges. 

7) aVx diverges and uVx — 0. 

8) aVx diverges and < limp^_j>oo i^'Px < oo- 

9) aVx diverges and uVx diverges. 
The asymptotic behavior of ([6T]) is 



V^^oo logVx 

since ^rank(AUB) — )• mm{p,q} with probabilty 1. 



lini = p (62) 
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In view of (59 1, avVx cannot diverge when p < 1, since 



(63) 



where the lower bound is the limit of T/R(a7^x) if aVx ^ oo while the upper bound is the limit of 
m.ia'Px) if aPx^O. 



1) Impossible because it would contradict (62 1. 



2) Impossible because it would contradict (60 1. 



3) Impossible because it would contradict (60 1 since g > 0. 



4) Impossible because it would contradict (62 1. 



5) Impossible because then av Vx — ^ and ( |60l ) would be contradicted. 

6) Impossible ii p < q since 'q^{aVx) = 1 — q would be outside the range established in ( |63| ). If 
p = q then the lower limit in ( [63l ) would be achieved at a finite argument of rjn which is impossible 
due to the strictly monotonic nature of that function. 



7) Impossible because it would contradict (60 1. 



8) Impossible if p = q because it would contradict (60 1. The case p < q is treated below. 



9) Impossible if p < q because it would contradict ( [601 ). The case p = q is treated below. 
We proceed to consider case[8]l when p < q. The solution of the fixed-point equation (59 l-([60l) yields 



lim „ 

V^^oo 1 + uVx 

lim 



Px^oo 1 + ai^Vx 
lim vS 

lim a 



q-p 

1 — p 

P 

q-p 
Q-P 
1-p 



We can proceed to upper bound X using Theorem [4] and (|64])-([67]): 

X < VniqVx) - VniaVx) - glog(l + lyVx) + log(l + auVx) 
1 



(1 -p) log 
(1 - q) loj 



1 — p 
1 



{q - p) log 



{q - p) loj 



q-p 

qC^-p) 

q-p 



1 — p 

< (l-9)log— h (q - p) log - + {q-p) log 

T " q 1 — p 



p 



< (1 - g) lo^ 
= h{q) 



1 

h g log - 

1-q q 



(64) 

(65) 
(66) 
(67) 

(68) 
(69) 

(70) 

(71) 

(72) 
(73) 
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We now proceed to consider case |9]) when p = q. In this case, the solution of the fixed-point equation 



(59i-(m yields 



lim = 1-q (74) 

Px^oo 1 + auVx 

lim auVx = — - — (75) 
As before, we can now proceed to upper bound X using Theorem [4| 

X < Vn{qVx)-VniaVx)-qlogil + i^Vx) + logil + ai^Vx) (76) 
= {VniqVx) - q log(l + qVx) - VniaVx) + q log(l + aVx)) 

(1 + aVx){l + vVx) 

^ (l-(?)log-^ (78) 
l-q 

< h{q) (79) 



where (78 1 follows from (74), (75 1 and the fact that the first term in the left side vanishes as Vx oo. 



Note that an achievability counterpart to Theorem [6] in the noiseless case (under a more general signal 
model) is given in [41], showing that p = g is the critical sampling rate threshold for exact reconstruction. 

D. Examples 

We provide a few numerical examples illustrating the results developed before. Figs. [T] |2] and [3] show 
the mutual information rate X as a function of the sampling rate p, for a Haar-distributed sensing matrix 
U and a Gaussian-Bernoulli source signal v with q = 0.2 and SNR = qVx is equal to 0, 20 and 50 dB, 
respectively. Each figure show also the corresponding lower and upper bounds provided by Theorems [3j 
m and ID We notice that the lower bound of Theorem [5] is close to the exact value of T for low SNR 
(in fact, it is tight for Vx 0). In contrast, for high SNR, the mutual information X is very closely 
approximated by the minimum of the two upper bounds provided by Theorem [3] and ( [49] ) in Theorem |4] 

It is also interesting to observe that the asymptotic regime of vanishing D{p,q,Vx) for any p > q 
is approached very slowly, i.e., an impractically high SNR is required. For example, we notice that at 
SNR = 50 dB the mutual information X in Fig. |3] achieves the upper upper bound of Theorem [3] (very 
close to h{q)) at p = 0.24, which is quite far from the threshold q = 0.2. Fig. |4] shows Xub evaluated at 
q = 0.2, p = 0.205 versus SNR in dB. In order to reach the value h{q) = 0.722 bits, we need an SNR of 
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Fig. 1. Mutual information rate X versus p, for g = 0.2 and SNR = qVx = dB. Upper and lower bounds are also shown 
for comparison. 



about 340 dB. This gives an idea of "how high" the high-SNR regime must be, in order to work closely 
to the noiseless reconstruction threshold. 



Next, we take a closer look at the behavior of the solutions of the fixed-point equation (17a I - (17b I. 
Even in the iid case (in which the equation reduces to ( [25] )) solved in ifTSll . |[28l . the question of how to 
choose among the multiple solutions has not been thoroughly addressed in the literature. Fig. [5} [6] and |7] 



show the fixed-point mapping function obtained by eliminating x from (17ai - (17bi, and given by 

/(i/^) = — 7 r r^^' (^^^ 

TIr -mmseVb|Vb + ?? 



given as a function of 1/r/, for q = 0.2 and SNR = 50 dB. The intersections of this function with the 
main diagonal are the solutions of the equation l/r? = /(1/r/). We explore the values of -p in the vicinity 
of the "phase transition" -p 0.24, for which the mutual information reaches a value very close to 
(corresponding to D{p,q,Vx) ~ 0). For p = 0.23 (see Fig. |5]l we have three solutions. Two are stable 
fixed points and one is an unstable fixed point. The solution corresponding to the absolute minimum of 



the free energy Ii is the right-most fixed point (see Fig. 5(c)), corresponding to a large value of l/rj. 
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Mutual information @ SNR = 20clB, q = 0.2, Haar sensing matrix 




/ '' Theorem 5 



0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 

sampling rate (p) 

Fig. 2. Mutual information rate X versus p, for q — 0.2 and SNR = qVx = 20 dB. Upper and lower bounds are also shown 
for comparison. 



which in turn translates into a large support recovery error rate, as we will see in Section IV-F For 
p = 0.24 (see Fig. |6]l we have also three solutions of which two are stable fixed points. However, now 
the solution corresponding to the absolute minimum of the free energy Xi is the left-most fixed point 



(see Fig. 6(c)), corresponding to a small value of l/i], i.e., to a very small support recovery error rate. 
This "jump" from the right-most to the left-most stable fixed point corresponds to a phase transition of 
the underlying statistical physics system. Notice that the phase transition may occur at finite SNR, as 
in this case, and the phase transition threshold p* is, in general, strictly larger than the noiseless perfect 
reconstruction threshold q. Finally, for values of p significantly larger than the phase transition threshold 
(see the example for p = 0.33 given in Fig. [7]) only one solution exists. In this case, the free energy Xi 



has only one extremum point which is its absolute minimum (see Fig. 7(c) I. For the Gaussian iid sensing 
matrix case it is known (see IIBTI and references therein) that the iterative algorithm known as AMP- 
MMSE achieves the right-most fixed point of (TTaj) - (17b). This coincides with the optimal MAP-SBS 
performance when this is the valid fixed point, corresponding to the minimum of Xi . Instead, when there 
are multiple fixed points and the left-most fixed point is the valid one, the MAP-SBS estimator is strictly 
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Mutual information @ SNR = SOdB, q = 0.2, Haar sensing matrix 




sampling rate (p) 

Fig. 3. Mutual information rate X versus p, for g = 0.2 and SNR = qP^ = 50 dB. Upper and lower bounds are also shown 
for comparison. 

better than AMP-MMSE. Our results lead us to believe that the same behavior holds for a more general 
class of sensing matrices, as studied in this paper. From the examples above we notice that the right-most 
fixed point is the valid one for p below the phase transition threshold. Above that threshold, either there 
is only one fixed point, for sufficiently large p, or one has to choose the solution that minimizes the free 
energy. 

IV. Analysis of Estimators using the Decoupling Principle 
A. Decoupling principle 

The decoupling principle introduced by Guo and Verdii |[T8]| states that the marginal joint distribution 
of each input coordinate and the corresponding estimator coordinate of a class of, possibly mismatched, 
posterior-mean estimators (PMEs) converges, as the dimension grows, to a fixed input-output joint 
distribution that corresponds to a "decoupled" (i.e., scalar) Gaussian observation model. The observation 
model treated by Guo and Verdii in lITSl is y = SFx + z, and the goal is to estimate x from y, while 
knowing S and F, where x is an m x 1 iid vector with a given marginal distribution, z is the iid Gaussian 
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Fig. 4. Mutual information upper bound (right-hand side of ^6^) versus SNR — qVx (dB), for q = 0.2 and p — 0.205. 



noise vector, S is a random n x m matrix with iid elements with mean zero and variance 1/n, and T is 
an m X m diagonal matrix whose diagonal elements have an empirical distribution converging weakly 
to a given well-behaved distribution. Comparing the model of [18] with ([T]l, we notice that as far as 
the estimation of the BemoulU-Gaussian iid vector v = Xb the two models are similar, by identifying 
S with AU, r with I and x with v, with the key difference that we allow a more general class of 



matrices satisfying the freeness condition given at the beginning of Section I-A In contrast, as far as the 



estimation of b is concerned, our model differs from ifTSll in that in our case the diagonal iid Gaussian 
matrix X is not known to the estimator. 

In this section, we apply the decoupling principle to the estimation of b for the observation model ([T]). 
This allows us to derive the minimum possible support recovery error rate for any estimator, achieved 
by the MAP-SBS estimator. The details of the derivations are given in Appendix [E| and the main results 
are summarized in the remainder of this section. We also consider linear MMSE and Lasso |36|, two 
popular estimators in the compressed sensing literature. These estimators first produce an estimate of v 
and then recover an estimate of the support b by component wise thresholding. In order to analyze the 
suboptimal estimators, we resort to the decoupling principle for the estimation of v, which can be derived 
along the same lines as Appendix |E] or, equivalently, by extending the analysis of ifTSl to the class of 
sensing matrices considered in this paper. In IIBTI . linear MMSE and Lasso estimators are studied for 
the case of iid sensing matrices as special cases of the Approximated Message Passing (AMP) algorithm 
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Mapping function @ SNR = 50 dB, q = 0.2, p = 0.23 



Mapping function @ SNR = 50 dB, q = 0.2, p = 0.23 
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Free energy @ SNR = 50 dB, q = 0.2, p = 0.23 
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l/r, 

(b) Detail near I/77 = 

Free energy @ SNR = 50 dB, q = 0.2, p ^ 0.23 




Free energy relative minimum 
corresponding to the invaiid Fixed-point 



(d) Free energy detail near 1/77 = 

Fig. 5. (a) Mapping function for ttie fixed-point equation l |17a[ l - l |17b^ for g = 0.2, p — 0.23 and SNR = 50 dB. (b) Detail 
in order to evidence tlie unstable fixed point and the left-most fixed point, (c) Corresponding free energy, (d) Detail of the free 
energy for small l/?7 in order to show the minimum corresponding to the left-most fixed point. 



ifm . the performance of which is rigorously characterized for U with iid Gaussian entries in the large 
dimensional limit through the solution of a state evolution equation lO. The current AMP rigorous 
analysis does not go through for the more general class of matrices considered here. Therefore, we resort 
to the replica + large deviation approach of Rangan, Fletcher and Goyal |[28l in order to obtain the 
decoupled model corresponding to these estimators. Interestingly, when particularizing our results to the 
iid case, we recover the same AMP state evolution equations as given in |31J. 

For the sake of notation simplicity, we shall assume that all random variables and vectors appearing 
in the following formulas have a density (possibly including Dirac distributions), indicated by p with 
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Mapping function @ SNR = 50 dB, q = 0.2, p = 0.24 
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Fig. 6. (a) Mapping function for the fixed-point equation l |17a[ l - l |17b^ for q = 0.2, p = 0.24 and SNR = 50 dB. (b) Detail 
in order to evidence the unstable fixed point and the left-most fixed point, (c) Corresponding free energy, (d) Detail of the free 
energy for small I/77 in order to show the minimum corresponding to the left-most fixed point. 



the appropriate subscripts and arguments. In order to limit the proUferation of symbols, we use the 
same symbols to indicate random variables (or vectors) and the corresponding dummy arguments in the 
probability distributions. 

The class of estimators for which the decoupling principle holds are mismatched PMEs where the 
mismatch is reflected in an assumed channel transition probability and symbol a priori probabilities that 
may not correspond to the actual ones. We shall reserve the letter q with the appropriate subscripts and 
arguments to indicate these assumed distributions. The true conditional channel transition probability of 
y given b, A, U, X of ([T]) is given by ( [T8| ). The corresponding assumed channel transition probability is 
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(c) Free energy (d) Free energy detail near l/rj = 

Fig. 7. (a) Mapping function for the fixed-point equation \na\ - \nb\ for q — 0.2, p = 0.33 and SNR — 50 dB. (b) 
Corresponding free energy, (d) Detail of the free energy for small l/?7 in order to show the minimum corresponding to the 
left-most fixed point. 



given by 

gy|b,A,u,x(y|b,A,U,X)= (^)%xp(-7||y-AUXbf), (81) 

where the assumed noise variance is I/7 instead of 1. We let also %(b) = Yli^iQbibi) denote an 
assumed a-priori distribution for b, not necessarily Bernoulli-g. The mismatched estimator for b given 
y, A, U is given by The corresponding PME takes on the form 

b(y, A, U) = / b (7b|y,A,u(b|y, A, U) dh, (82) 
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where 

/, I A TTA /9y|b,A,u,x(y|b, A,U,X)gb(b)px(x)dx 

%|y,A,u D A, U = J — — — , (83) 

J 9y|b,A,u,x(y|b', A, U, X)gb(b')Px(x)(ix(ib' 

and where px(x) = ^-^^ .^^ exp(— ||x|p/Pa;) is the n-variate iid Complex Gaussian density with compo- 
nents ~ CM{0,Vx)- 

In the matched case, for 7 = 1 and gb(b) = BernoulU-g, (82) coincides with the MMSE estimator. |^ 
By considering general 7 and (/b(b), we can study of a whole family of mismatched PMEs through the 
same unified framework ||35]| . ifTSll . 

For the purpose of analysis, it is convenient to define a virtual multivariate observation model in- 
volving the random vectors bo ~ pbo(bo)> Bernoulli-g, the corresponding observation channel output 
y = AUXbo + z as in ([T]l, and an intermediate vector b ~ 'Zb(b), not corresponding to any physical 
quantity present in the original model, such that the conditional joint distribution of bo, y, b given A, U 
is given by 

Pbo(bo) Py|bo,A,u(y|bo, A,U) gb|y,A,u(b|y, A,U), (84) 

with 

Py|bo,A,u,x(y|bo, A, U, X) = exp (- ||y - AUXbof ) . (85) 
Then, b(y, A, U) can be seen as the "matched" PME of b given y with respect to the joint probability 



distribution ( |84| ). Notice also that ( |84| ) satisfies the conditional Markov Chain bo — )• y — )• b, for given 
A,U. 

The decoupling principle obtained in this paper and proved in Appendix [E] can be stated as follows. 
Let {boi,bi,bi) denote the z-th components of the random vectors bo, b, b(y, A, U), obeying the joint 



conditional distribution (84 1 with b(y,A,U) given in (82). Then, in the limit of n — )• 00, under the 
assumption that the replica-symmetric analysis holds (see Appendix|E|, the joint distribution of {boi, h, bi) 
converges to the joint distribution of the triple {Bq, B, B) induced by 

PBoibo) Py|Bo;r?(y|&0) qB\Y;i{b\y), (86) 

and by -B = J 6 <lB\Y;imy)di', where we define the decoupled channel 

Y = Vo + v~"-Z, (87) 



^This is the PME for the matched statistics, which effectively minimizes the MSE. 
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with Z ~ C7V(0, 1) and I/q = X^B^, with Bq ~ PBo{bo), BemoulU-g, and with Xq ~ CA/'(0,P^), and 
where Xq,Bq and Z are mutually independent. Also, we define V = XB with X ~ CM{Q,Vx) and 
i3 ~ (7b (6) identically distributed as the the marginals of the assumed prior distribution gb(b). We let 
Px{-) denote the common density of Xq and X, and define the following probability densities for the 
variables Vq, Y, V, Bq and B: 

(88) 
(89) 
(90) 
(91) 

(92) 



PY\Vo;v(yM = 


- exp \^-r] \y - Vol ) 


PY\Bo;viy\bo) = 


J PY\Vo;riiy\xobo)Pxixo)dxo 


QY\vAy\'") = 




QY\B;^{y\b) = 


J QY\V;^iy\xb)px{x)dx 


QB\YAb\y) = 


(lY\B;i{y\b)qB{b) 


IqY\BAyW)<lBib')db'' 



where the parameters i] and ^ are obtained by solving the system of fixed-point equation: 

X = 7mmse(y|y) 

e = 7^r(-x) 
V = 



ih + n^{-x){5-x) 



(93a) 
(93b) 
(93c) 
(93d) 



The expectations in ( |93a[ ) - ( |93d| ) are defined with respect to the joint distribution of Vq,Y,V given by 

PVo(«o) PY\Vo;viyH) <lV\Y;^{v\y), (94) 



where pvo{vo) is the BernoulU-Gaussian distribution of Vb = XqBq, PY\Vo;ri{y\vo) is given in (88 1 and 
where 

(iY\v;i{y\v)qv{v) 



qv\Y-i{v\y) 



QY-Ay) 



with qY\V;^{y\'^) given in (90 1, qv{v) is the distribution of V = XB, and 

qY-^y) = / qY\v-dy\v)qv{v)dv. 



(95) 



(96) 



In passing, notice also that (86 1 and (94i satisfy the Markov Chains Bq ^ Y ^ B and Vq ^Y ^ V, 
respectively. 



''We use the dot notation f{x) to denote the first derivative of a single-variate function / with respect to its argument. 
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If the solution to ( |93a[ ) - (93d]) is not unique, then we have to select the solution that minimizes the 
system "free energy" (expressed in nats): 

^ = log - - - + 7 - ex + f - - — + n^{-w)dw - E [log {qY;i{Y))] . (97) 
IV \V J 1 Jo 



As expected, by letting 7 = 1 and qsib) BernouUi-g we obtain £, = 1] and 5 = x ^^d (93ai - (93di 



reduce to (17ai - (17bi. It is also immediate to see that in this case we have £ = Ii + log(7re) where 
Xi is given in ([T6]). 



By particularizing our analysis to the case of U with iid elements, using (24 1, we obtain the simpler 
fixed-point equations 

1 _ 1 
V P 

- = -(- 

C P \1 



l+E[\Vo-E[V\Y]\^]) 
+ mmse(F|y) 



(98a) 
(98b) 



which recovers the results of |[T8l . ||28]| . |[T5l up to a different normalization as discussed in the first 



example of Section II-D 



B. Symbol-by-symbol MAP estimator 

As an application of the decoupling principle, we can determine the minimum achievable D{p,q,Vx) 
by particularizing the above formulas for the MAP-SBS estimator of bi given y, A, U, operating according 
to the optimal decision rule 



6,(y,A,U) = arg max P[6i = 6|y, A, U]. 

b6{0,l} 



(99) 



It is well-known that the MAP-SBS minimizes the support recovery error rate over all possible estimators. 
A byproduct of the decoupling principle is that, in the matched case, ( [86] ) yields immediately that the 
limiting posterior marginal P[6j = 6|y, A,U] for a randomly chosen i-th component of b is given by 
PBo\Y\ri{bo\y), the posterior distribution of the decoupled channel (87i, marginalized with respect to Bq. 



In the matched case, (93a) - (93d) reduce to (17ai - (17bi in Theorem [T] and PBo\Y;r]{bo\y) is easily 
obtained by noticing that Y given Bq is conditionally distributed as 



PY\B„;ri{y\bo) 



i.e., Y ^CM (0, Vx + l/r]) for = 1 and F ~ CN (0, 1/??) for Bq = 0. Then, 

¥[Bq = l\Y = y] 



1 + + riV.,) exp {-r^VMy?) ' 



(100) 



(101) 
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(obviously P[Bq = 0\Y = y] = 1 —F[Bq = 1\Y = y]) where r] is obtained from ( 17a) - ( 17bl and where 
we define: 



/^ = T-V- (102) 



The resulting MAP-SBS estimator is 



S(y) = arg max F[Bo = bo\Y = y], (103) 
feoe{o,i} 

with decision B{y) = 1 if 

^(1 + r]V,) exp {-r]V,^\y\^) < 1 (104) 

(with randomization on the boundary). Taking the logarithm of both sides, we find the "energy detector" 
(analogous to non-coherent on-off modulation with fading) given by 

. 1, for IvP > r 
B{y) = { (105) 



with 



0, elsewhere 



1 



v'Pxfj' q 

We have B{y) = 1, regardless of the value of y G C, if g > 2+^" ' which case D{p,q,Vx) = 1 — q. 
Otherwise, 

D{p, q, Vx) =q{l- exp (-/ir)) + {1 - q) exp (-r?r) , (107) 



obtained from (105 1 by observing that \Y\'^, conditioned on Bq, is central chi-square with two degrees 
of freedom with mean "P^ + l/r^ for i?o = 1 and with mean l/rj for Bq = 0. 



For U with iid elements, we can recover known results. In this case, ( |17a[ ) - ( |17b[ ) reduce to p5] ), 
which corresponds to the replica analysis of the MMSE estimator obtained in lITSl and summarized in 
lISH in the context of support recovery in compressed sensing. When the iterative solution of the fixed- 



point equation (25 1 is initialized by l/i] = {l + qVx)/p, then the iteration converges to the solution of the 



so-called "AMP-MMSE" state equation given in Oil Th. 6]. In brief, by this initialization the iterative 
solution converges always to the right-most fixed point of the mapping function (see Figs. [5] - [7] and 
related discussion). Instead, if the valid fixed-point is chosen, i.e., the solution which minimizes the free 
energy Zi, then we obtain the so-called "replica MMSE solution" of |[3TI Th. 8]. 

Next, we discuss the threshold for perfect support reconstruction in the noiseless case, i.e., in the limit 
of Vx —J" 00, and q > 0. From Theorem |6] we already know that vanishing D{p, q, Vx) cannot be achieved 
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for any p < q. We now show that D{p,q,T'x) vanishes for large Vx for all p > q. This has previously 
been shown for both optimal nonlinear measurement schemes and for Gaussian iid sensing matrices in 
H3l . Therefore, the conclusion about the asymptotic optimality of Gaussian iid sensing matrices found 
in ll43l extends to sparsely sampled free random matrices. We start by recalling the following general 
result from H2l: 



Theorem 7 Let V is a discrete-continuous mixed distribution, i.e. such that its distribution can be 
represented as 

v = {\- p)vd + pvc. (108) 

where is a discrete distribution and is an absolutely continuous distribution, and < p < 1. Then, 
for Z ~ CAA(0, 1) we have 

mmse(1/|\/snry + Z) = — + ( — ) . (109) 

snr Vsnr/ 



We are interested in the behavior of the SNR of the decoupled channel ( 87 1 resulting from the MAP- 
SBS estimator, given by qr]Vx, as Vx — > oo. In particular, for given sparsity < (7 < 1, we are interested 
in determining the range of sampling rates p for which qrjVx ^ 00, implying that D{p,q,Vx) — ^ 0. Let 
Z and Vq be as defined in Claim [1] Then, using Theorem [7] we can write 

mmse{Vo\Vo + ri"^Z^ = Vx mmse {Vq/ V^l^/V^Vo/ V^x + z'^ (110) 

= (111) 

V 

where, for the time being, we assume that VxV grows unbounded as Vx — 00. Using ( |111[ ) into (17ai - 



(17b I, for sufficiently large Vx we have 

7? = 7^R(-mmse(yo|l"o + ??'^)) (112) 



^r(-^). (113) 



For the case of U with iid elements, using (24) we obtain 



^r(^) = 1-- (114) 



and solving (113 1 with respect to r], we obtain 

rj = p — q. (1 15) 
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In the case of Haar-distributed U, using (36 1, we obtain 



z 



V 



{l-z)z 
p-q 



l-q 



(116) 
(117) 



For p > q, in those two cases the solutions are strictly positive and, consequently, the support recovery 
error rate vanishes as the SNR grows without bound. In fact, as we show next, this conclusion holds for 
the general class of sparsely sample free random matrices. 

The goal is to show that limp^_5.oo r] > for p > q, without relying on a closed-form expression 
for the R-transform. This implies that D{p,q,Vx) vanishes for large Vx for all p > q. Assuming that 



(113 1 holds, using the definition of the R-transform as function of the r/-transform given in [38, Eq. 2.75 
Sec. 2.2.5] and the definition of r/-transform as given in [38, Sec. 2.2.2], we can rewrite the asymptotic 
equality rj = TlB.{—q/ri) as: 

1 



1 - E 



1 + s|R|2 



where s satisfies 



E 



l + s|R|2 



(118) 



(119) 



and |Rp denotes a random variable distributed as the limiting spectrum of R. 



By eliminating q and solving for t] in (118 1, (119 1 we obtain 



V 



E 


r iRp 1 


_l+s|RP_ 


E 


1 


_l+s|RP_ 



(120) 



It is immediate to see that ( 120[ ) is strictly positive for any finite s (ranging from the mean to the harmonic 
mean of |Rp). In view of Property (262 1 of the ?7-transform, 

1 



l-p<E 



l + s|R|2 



< 1, 



(121) 



we conclude that (118 1 admits a unique positive and finite solution s if and only if 1 — g G (1 — p, 1], 
i.e., for p > q. Hence, ( 120[ ) yields r/ > for — oo, as we wanted to show. 

We conclude this section by providing expressions for the MMSE in the estimation of the Bernoulli- 
Gaussian signal V for high SNR. For iid U, we have 



mmse ( VqI^o + rj^Z 



(122) 
(123) 
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while for Haar-distributed U, we have 



mmse[Vo\Vo + v--Z) = -n:^\v) (124) 

= ^"^ . (125) 

(1 - r])r] 



Notice that (123 1 coincides with the result derived in B31 and that the high-SNR MMSE diverges for 
p = q- Since deleting samples cannot improve the performance of the optimal MMSE estimator, it 
diverges for all < p < q. 

C. Replica analysis of a class of estimators via the large-deviation limit 

The classical noisy compressed sensing problem seeks the estimation of the sparse vector v = Xb 
from y in ([T]) for known A, U. Then, b can be estimated by componentwise thresholding the estimate 
of V. 

A number of suboptimal low-complexity estimators in the compressed sensing literature take on the 
form 

V = arg mm | 7 IIy " AUvf + fi'^i) \ ' (126) 



i=l 



for some weighting parameter 7 > and cost function / : C — IR+. 

The replica decoupling principle can be used to study the large-dimensional limit performance of such 
class of estimators by following the large-deviation recipe given in ||28l . Briefly, the approach of f28l 
considers a sequence of mismatched PMEs indexed by a parameter k € M+, where the assumed a priori 
density for v takes on the form 

J exp(-K^^^-^/(zi))dz 

(assuming that the integral converges for sufficiently large k), and where the assumed transition density 
is given by 

'?Si,A,u(y|V'A,U)= (^)"exp(-7K||y-AUvf) . (128) 



Under a number of mild technical assumptions (see ESl for details), v in (126 1 can be obtained as the 
Umit of the PME 

^^"^ = / vgS,A,u(v|y'A,U)dv. (129) 

for K — )• 00. Furthermore, forn — )• 00 and assuming the validity of the replica analysis, a decoupled scalar 
channel model in the limit of k — )• 00 can be established such that the joint distribution of {voi,Vi,Vi) 
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converges to the joint distribution of (Vq, V, V), where the form of the joint distribution of Vq, Y, V is 



again given by (94) and where F is a function of Y. The form of the fixed-point equations yielding r] 
and ^ and of y as a function of Y depend on the specific estimator considered, i.e., on the value of 7 
and on the cost function f{v) in ( 126[ ). In particular, following in the footsteps of [28] with a few minor 



variations in order to adapt to our case, |^ it is not difficult to show that V = v{Y; where we define 

v{y;0 = argmin{^|y - v\^ + f{v)} , (130) 
and that the fixed-point equations yielding t] and ^ in the limit of k — )• 00 are given by 

X = jE[a\Y;^)] (131a) 
,5 = E[\Vo-v{Y;0\^] (131b) 



e = 7^r(-x) (131c) 

ai+'^R{-x){s-xy 



V = J (131d) 



where 



^^'^^ v^Hy;OC\y-v\'' + f{v)-[C\y-v{y;0\' + fiviy;0)r 



When U has iid elements, from ( |98a| ) - ( 98b| ) we find 



- = -{l+E[\Vo-v{Y;0\^]) (133a) 

- = -(- 

which coincide with ll28l Eq. (30a) - (30b)], up to a different normalization and the fact that we consider 
complex circularly symmetric instead of real random variables as in [|28l . 



(^i+E[cT2(y;0]) , (133b) 



D. Thresholded linear MMSE estimator 

A simple suboptimal estimator for v is the linear MMSE estimator, given by 

V = [7-^1 + R] aV- (134) 

with 7 = qVx and R defined in ([3]). It is immediate to verify that (134i can be expressed in the form 



(126 1 by letting f{v) = \v 



2 



^Details are omitted since they can be easily worked out from |28| . 
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Although the asymptotic performance and the decoupled channel model of linear MMSE estimation 
can be obtained directly from classical results in large random matrix theory both for iid and for Haar- 
distributed U (see |l33 and references therein), it is instructive to apply the replica large-deviation 
approach outlined before. In this way, we can recover known results obtained rigorously by other means, 
thus lending support to the validity of the replica-based large-deviation approach. 

: I tip we obtain 



Particularizing (130 1 and (132i to the case f{v) 

v{y;0 = 



1 + ^ 



y 



and 



yielding 



c^\y;0 



E[\Vo-viY;0\' 



1 



E 



Vn 



1 + e 



-Y 



7 + 

(1 + 0'' 



(135) 
(136) 

(137) 
(138) 



where we used the fact that E[|Vop] 



qP^ = 7. Replacing (136 1 and (138 1 into (131a) - (131di, we 



obtain the fixed-point equations for the linear MMSE estimator. In the iid case, using (133ai - (133bi, 
we obtain that C = 77/ and 

-(1 + (1 - p)j) + ^(l + (l-p)7)2 + 4p7 



V 
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(139) 



which coincides with the well-known expression of the multiuser efficiency of the linear MMSE detector 
for an iid matrix with aspect ratio pn x n and elements with mean and variance 1/n (see [38 1 and 



expression (54i evaluated for /3 = 1, s = 7). 



In the Haar-distributed case, using (36 1, we can solve explicitly for ^ by eliminating x in (131ai and 



(131c I. After some more complicated algebra than in the iid case, we arrive at the solution 



IP 



(140) 



l + (l-p)7 

We also find that, as in the iid case, ^ = 77/. Hence t] is given in closed form as 

= P 

l + (l-p)7' 

which coincides with the well-known form of the multiuser efficiency of the linear MMSE detector for 
a CDMA system with observation model r = AUv + z, where U is n x n Haar-distributed, given by 



(141) 



the solution of (55 1 in the case /3 = 1, s = 7 (or, equivalently, by the limit of (56 1 for /3 — )• 1). 
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In order to calculate the performance of the thresholded linear MMSE estimator, notice that the 
estimator output converges in distribution to V = v{Y;^) = where, according to the decoupled 

channel model, Y = Vo + r]~2 Z, and Z ~ CA/'(0, 1). Thresholding y or y is clearly equivalent. Hence, 
the support recovery error rate in this case takes on the same form already derived for the MAP-SBS 
(see ( 105 1 - ( 107 1), for a different value of r] calculated via ( 131al - ( |131d[ ). 



E. Thresholded Lasso estimator 



We now follow an approach similar to that in Section IV-D in order to analyze the Lasso estimator, 
which so far has only been analyzed for iid sensing matrices. 

The Lasso estimator, widely studied in the compressed sensing literature fl4l . ||9l comes directly in 



the form ( 126 1 for f{v) = \v\. In this case, the parameter 7 must be optimized depending on the target 
performance. For example, in the classical noisy compressed sensing problem we are interested in the 
value of 7 that minimizes E[||v — v|p]. 



Particularizing (130 1 and (132i to the case f{v) = \v\ we obtain 

1 



v{y;0= \y\ 

where [•]+ takes the positive part of its argument, and 

^\y;0 = ^l\y 



2e. 



y_ 
\y\ 



1 1 1 



(142) 



(143) 



where 1{-} is the indicator function of the event inside the brackets. Notice that ( 142 1 and ( 143 1 generalize 
the expressions found in |[28l to the complex case. In this case, we have 



E[\Vo-viY;0\^] 



E 









Y 


2' 




Vo- 

















q'Px + 



1 



-»?' 



+ - 



V 



'vrr/'erfc 



erfc I V n' 



(144) 



where rj' = 77/(4,^^), fi' = ^/(4^^), and /i is defined in (102i. The derivation of (144i is not completely 
straightforward and it is provided in Appendix IH] 



From ( 143 1 we have 



E[a\Y;0] 



\Y\ > 1/(20] 



qe 



+ (1 - q)e-'^' 



(145) 
(146) 
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Replacing (144i and (145 1 into ( |131a i - ( 131d[ ), we obtain the fixed-point equation for calculating the 



decoupled channel parameters rj, ^ for the analysis of the Lasso estimator for given parameter 7. In 



the iid case, using (133ai - (133b), we obtain the same system of equations given in [28], up to a 
different normalization and the fact that here we consider complex signals. Furthermore, it is immediate 
to recognize that ( 133a[ ) corresponds to the state evolution of the AMP with soft- thresholding (AMP- 



ST) as described in |[3TI . where the scalar soft-thresholding function is given by (|142[) for an arbitrary 



thresholding parameter > 0. The large-dimensional analysis leading to the state evolution equation 



( 133a I is rigorously proved in ||3l for the case where U is iid Gaussian. Based on this fact, it is tempting 
to conjecture that the analysis is valid for the general iid case (subject to usual mild conditions on the 
matrix element distribution) and that the replica analysis yields correct results also for the more general 
class of matrices considered in this paper. 

In order to obtain an estimate of b (support of v), a natural approach consists of selecting the non-zero 
components of v. However, this method yields rather poor results in the Bernoulli-Gaussian case and in 
other cases where the magnitudes of the non-zero components of v are not bounded away from zero. 
Instead, in an iterative implementation of the Lasso solver (e.g., using the method in |45|, or the AMP- 
ST), it is possible to generate a "noisy" version of the Lasso estimate v before the soft-thresholding step 



(see Section [TV-F and EH)- This noisy Lasso estimate corresponds to the decoupled channel model with 



marginal distribution Y = Vo + rj 2 Z, with t] given by the fixed-point equation in the Lasso case. Hence, 



the support recovery error rate takes on the same form already derived for the MAP-SBS (see ( 105 1 - 



( |107| )), for a different value of r], calculated via ( |131a[ ) - ( |131d| ) for the Lasso case as explained above. 



F. Support recovery error rate examples 

In order to illustrate the above results and compare the behavior of different support estimators, we 
show some numerical examples and compare the theoretical asymptotic results with finite-dimensional 
simulations. Figs. [S] and |9] show the support recovery error rate D{p,q,Vx) versus the sampling rate p 
for a Haar-distributed sensing matrix U and a Gaussian-Bernoulli source signal v with q = 0.2 and 
SNR = qVx equal to 20 and 50 dB, respectively. 

A few remarks are in order: 



The MAP-SBS asymptotic distortion is obtained by choosing the fixed-point solution of ( 17a) - ( 17b 1 



that minimizes the free energy Zi , as discussed in Section III-D Instead, if we choose only the right 



most fixed point, we obtain the solution of the conjectured state evolution equation corresponding to 
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0.25 r 



Support recovery error rate @ SNR = 20dB, q = 0.2, Haar sensing matrix 
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sampling rate (p) 



Fig. 8. Support recovery error rate D(p,q,'Px) versus p, for q = 0.2 and SNR = qVx = 20 dB for different estimators, 
asymptotic results and and finite-dimensional simulations. Solid thick line: MAP-SBS, asymptotic; Dotted line: Information 
theoretic lower bound; Dot-dash line: Thresholded Lasso, asymptotic; Dashed line: Thresholded linear MMSE, asymptotic; Thin 
solid line: Conjectured AMP-MMSE, corresponding to the right-most fixed point of l |17a^ - ([iTbJ. Some finite-dimensional 
simulations are shown for dimension n — 100 for the thresholded linear MMSE estimator (asterisk: Haar sensing matrix; 
triangle: DFT sensing matrix) and for the thresholded Lasso (lozenge: Haar sensing matrix; star: DFT sensing matrix). 



the AMP-MMSE applied to Haar-distributed sensing matrices. As previously remarked, it is known 

that such state evolution equation is exact in the case of iid sensing matrices. 

The information theoretic lower bound is obtained by taking the minimum of all the upper bounds 



on I developed in Theorems |48] and |4j and using it in ( [T4] ). 
• We run finite-dimensional simulations for dimension n = 100 for the thresholded linear MMSE and 
thresholded Lasso estimators. We considered both random unitary U (Haar distributed) and the case 
of a fixed deterministic U = F, where F is the n-dimensional unitary DFT matrix with elements 
[F]m,fc = " ' — -■ Interestingly, the simulations show that random unitary and deterministic 
DFT yields essentially the same performance (up to Monte Carlo simulation fluctuations). This 
corroborates our conjecture that the asymptotic analysis of Haar-distributed U carries over to the 
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Support recovery error rate @ SNR = 50dB, q = 0.2, Haar sensing matrix 

0.25 I 1 1 1 1 1 1 1 1 1 




sampling rate (p) 

Fig. 9. Support recovery error rate D(p,q,'Px) versus p, for q = 0.2 and SNR = qVx = 50 dB for different estimators, 
asymptotic results and and finite-dimensional simulations. Solid thick line: MAP-SBS, asymptotic; Dotted line: Information 
theoretic lower bound; Dot-dash line: thresholded Lasso, asymptotic; Dashed line: Thresholded linear MMSE, asymptotic; Thin 
solid line: Conjectured AMP-MMSE, corresponding to the right-most fixed point of l |17a^ - ([iTbJ. Some finite-dimensional 
simulations are shown for dimension n — 100 for the thresholded linear MMSE estimator (asterisk: Haar sensing matrix; 
triangle: DFT sensing matrix) and for the thresholded Lasso (lozenge: Haar sensing matrix; star: DFT sensing matrix). 

case of a DFT matrix. The case of DFT matrices is particularly relevant for applications, since in 
many communication and signal processing problems signals are sparse in the time (resp., frequency) 
domain and are randomly sampled in the dual domain, so that a random selection of the rows of a 
DFT matrix arises as a sensing matrix naturally matched to the problem. 

• As already noticed in several works, the gap between the optimal MAP-SBS estimator and the 
suboptimal low-complexity estimators grows for high SNR (compare Fig. |8] and Fig. |9]). In contrast, 
the thresholded linear MMSE estimator yields poor performance for all p < I, and this is quite 
insensitive to SNR. 

• In order to solve the complex Lasso, we used the iterative method of [45 1. This scheme has slightly 
lower complexity than AMP-ST, and provably converges to the Lasso solution. By comparing the 
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component-wise thresholding step in P31 and the symbol-by- symbol estimator v{Y;^) for the 



decoupled channel model given in ( 142 1, it is natural to identify the noisy Lasso solution with 
the vector 

V = + DG"^ fy - Gv^°°A , (147) 



where v(°^) is the solution of the iterative algorithm of fl31 after convergence, G is the matrix 
obtained by taking the non-zero rows of AU, and D = diag(l/||gi|p, . . . , l/||gn|p) where g£ is the 
£-th column of G. The support recovery error rate shown in Figs. [8] and |9] for the finite-dimensional 



simulation of the thresholded Lasso is obtained by applying the threshold detector given in ( 105 1, 



for r] calculated via the asymptotic fixed-point equations (131ai - (131di, to the components of v 



given in ( 147 1. The asymptotic analysis and the finite-dimensional simulation were computed for the 
same value of the parameter 7, which must be chosen for each combination of system parameters 
p, q and Vx- Several heuristic methods for the choice of 7 are proposed in the literature. Following 
B6l . we used 7 = (1/20)||GV||oo (the optimization of 7 for the asymptotic case is an interesting 
topic for further investigation.) 

V. Conclusion 

In the standard compressed sensing model, the sensing matrix AU is such that A is diagonal with 
independent {0, 1} components and U has iid coefficients. In addition to this model, we allow the square 
matrix U to be Haar-distributed (uniformly distributed among all unitary matrices) or, more generally, 
to be free from any Hermitian deterministic matrix. 

Motivated by applications, in this paper we have carried out a large-size analysis of: 

1) the mutual information between the noisy observations and the Bernoulli-Gaussian input (condi- 
tioned on the sensing matrix), 

2) the mutual information between the noisy observations and the Gaussian input prior to being subject 
to random "hole-punching". 

We have obtained asymptotic formulas using fundamentally different approaches for both mutual infor- 
mations: the first following a replica-method analysis whose scope we enlarge to encompass the desired 
class of random matrices, while the second invokes results from freeness and the asymptotic spectral 
distribution of random matrices. 

Depending on the case, the mutual informations are expressed either through the mutual information 
between a scalar Bernoulli-Gaussian random variable and its Gaussian-contaminated version, or explicitly, 
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through the solution of coupled nonlinear equations. We have also studied how to choose among the 
solutions of those equations. 

Our upper and lower bounds on the mutual informations do not rely on the replica method. Yet, they 
turn out to give excellent agreement with the replica analysis. Through the analysis of the bounds we 
also provide a simple converse which shows that the asymptotic distortion is bounded away from zero 
regardless of signal-to-noise ratio for p < q. For p > q, Wu and Verdu [43 1 showed that Gaussian iid 
sensing matrices are asymptotically as effective for compressed sensing as the best nonlinear measurement 
(or encoder). Here, we have been able to extend that conclusion to the class of sparsely sampled free 
random matrices. 

We have analyzed several decision rules such as the optimum symbol-by-symbol rule, the Lasso, and 
the linear MMSE estimator, followed by thresholding for support recovery. Those analyses follow the 
decoupling principle, originally introduced in lITSl for iid matrices. Specializing these new results we 
recover the iid formulas found in lITSl . ll28l . lISTI . with the exception of the ML detector analyzed in 
|[3T|| . which is tailored to the case when the number of nonzero coefficients is known at the estimator, 
while in our analysis that number is binomially distributed. 

The important case where U is a deterministic DFT matrix remains open. However, we have provided 
intuition and simulation evidence to buttress the conjecture that its solution in fact coincides with the 
case where U is Haar distributed. 
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Appendix A 
Proof of Theorem[T] 

Let {X, Y) ~ PxPy\x- For any X e X, such that X ^ X, and function d : A' x ^ ^ [0, oo). 



( [T2| ) and the data processing inequality yield 

R{E[d{X,X)]) < I{X;X) (148) 
< I{X;Y) (149) 
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Supremizing over X and in view of the fact that R{-) is a monotonically non-increasing function, the 
result follows. 

It is worth emphasizing the totally elementary nature of the proof of Theorem [T] and in particular the 
fact that it does not involve any type of operational characterization of information theoretic fundamental 
coding limits. A different approach based on those limits and Fano's inequality is taken in |31] to show 
Lemma 5 therein. 



Appendix B 
Proof of Claim [T] 

We let vq = Xbo with Xdiag(x) and x an iid Gaussian vector with Px(x) = -j„ exp (— ||x|p/Pa;) 



and bo BernouUi-g, with probability mass function Pbo(t>o)- Notations are as in Section IV-Ai. In 
particular, y = AUXbo + z, as in Q. Consider the assumed conditional probability density 

gy|v,A,u(y|v,A,U) = (^)"exp(-7||y-AUvf) (150) 



for some 7 > 0. We also consider an assumed iid prior density on v, denoted by g{-v) for simplicity of 
notation, and let go{vQ) denote the Bernoulli-Gaussian density of vq. Removing the conditioning with 
respect to v, we obtain 

'?y|A,u(y|A,U) = (^)"y" <7(v)exp(-7||y-AUvf)dv. (151) 

We wish to calculate the mutual information rate Ii defined in ([8]l, which can be expressed as 

Xi = lim -/(vo;y|A,U) (152) 

n— >oo n 

= lim - [-E[logpy|A,u(y| A, U)] + E[logpy|,„,Au(y|vo, A, U)]] (153) 
= - lim -E[logZ(y,A,U)]U,^.o()-log(7re) (154) 

n-5-oo n 7=1 



where we define Z{y, A, U) = gy|A,u(y| A, U), and recognize that ( 151 1 can be interpreted as the parti- 
tion function (from which the notation "Z") of a statistical mechanical system with "quenched disorder pa- 
rameters" y, A, U, "state" v ~ g{-) and unnormalized Boltzman distribution 9y|v,A,u(y| v, A, \J)g{v). |^ 
The condition g{-) = go{-0 and 7 = 1 correspond to the case where the assumed prior and noise 
variance in the observation model are "matched", i.e., they coincide with the true priors and noise variance. 



*In this case, 'H(v\y, A, U) — ||y — AUv||^ — ^ logg(v) plays the role of the system's Hamiltonian, and 7 is the inverse 
temperature 1181 . 
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However, it is useful to consider the derivation for general 7 and g{-), since this same derivation will 
apply to the general class of mismatched PMEs defined in Section [IV-A| ). The quantity 

^ = - lim -E[logZ(y, A,U)] (155) 

n— >oo Ti 

is the system per-component free-energy of the underlying physical system. In the following, we shall 
compute £ using the RepUca Method of statistical physics, under the so-called Replica Symmetry (RS) 
assumption ||26ll . |[25l . |[35l . |[T8l . llTTl . Summarizing, the method comprises the following steps: since 



computing the expectation of the log in (155) is usually complicated, we use the identity 

E[log Z(y, A, U)] = lim ^ log (E [Z"(y, A, U)]) (156) 
for u G M+. Then, exchanging limits, we can write 

£ = -lim^ lim - log (E [Z«(y, A, U)]) (157) 

u-¥0 OU n->oo n 



Finally, we evaluate the quantity 



lim -log(E[Z"(y,A,U)]) (158) 



for u positive integer, such that Z^(y, A, U) can be seen as the partition function of a u-fold Cartesian 
product system (i.e., u parallel "replicas" of the original system), with state vectors vi, . . . , v„, and the 
same quenched parameters y, A, U. In particular, we can write 

Z"(y,A,U) = (J)""y'dvi---dv„ |^P^5(v,)j exp |^-7^||y-AUv,f^ . (159) 

The next step consists of calculating 

E[Z"(y,A,U)|A,U] = (160) 
^) y c^vodvi • • -dvu I 5o(vo) ]^5(va) j y exp | -7 ^ ||z - AU(va - Vq) |M ;^e"l'^"' dz 



a=l / \ a=l 



(161) 

Standard Gaussian integration (by completing the squares) yields 

y"exp ^-7f^||z- AU(v,-vo)f^ = (1 + ^7)^" exp (-ntr (RL)) (162) 

where R = U^^Al^AU, as defined in (js}, and where L is a rank-u matrix defined as follows: let = 
Va — vo for a = 1, . . . , u, and let S = [si, . . . , s„]. Then, 



L = -S I . 11 ' ST (163) 

n \ 7 T + n 
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where 1 denotes an all-ones column vector of appropriate dimension. Next, we need to average with 
respect to A, U, i.e., with respect to R. To this purpose, we apply the generalized Harish-Chandra- 
Itzykson-Zuber integral |[T9l . lIlTI as follows: 

lim -logE[Z"(y,A,U)] = 

n— ^-oo n 



= ttlog7 — lilogvr — log(l + wy) 

+ lim - log I / dvo • • • d^ru \ goi^o) TT di^a) ) lE^exp (-ntr (RL)) 



a=l 



(164) 



u log 7 — li log TT — log(l + wy) 



/■A.(L) 



Jirn^ ^ ( / d-vo--- dvu [ go{^o) ^(va) j exp | -n ^ j nn{-w)dw 



a=l 



1=1 



(165) 



where TZ^{'w) denotes the R-transform of R and A(L) denotes the i-th eigenvalue of L. 
Our goal now is to evaluate the limit 



1 



n ' / "^^0 ■ ■ ■ ^"^^ ( fi'o('^o) Jl 5(va) J exp ( -n ^ | n^{-w)dw 



a=l 



i=l 



A.(L) 



(166) 



In order to proceed, we make the common RS assumption and define the empirical correlations 

1 " 

Qa.a' = - VakV*a,k (167) 
n ^ — ' 



k=l 



of vectors Va,Va' for < a,a' < u. Noticing that the limit (166 1 is given as the limit of a normaUzed 
log-sum, Varadhan's lemma yields that this limit is given by the "dominant configuration" of the vectors 
vo,...,Vu, defined in terms of their empirical correlation matrix Q = [Qa,a'\- The RS assumption 
"postulates" that this dominant configuration satisfies the following symmetric form: 

n {€i-u)i + ujii^ 

In Appendix [F] we show that, for Q in the form ( I681, the eigenvalues of L are given by 

A , N ei — w + u{eQ — 2Re{??} + uj) 



Q 



(168) 



Ai = Ai(L) 

A 



7 ^ + u 

A2 = Ai(L) = 7(ei-tj), i = 2,...,ii 
Ai(L) = 0, i = u + l, 



, n. 



(169) 

(170) 
(171) 
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Therefore, we define 



e'"'(Q) 



E 

i=l 
Ai 



A.(L) 



'R'-R{—'w)dw 



TZii{—w)dw + {u — I) / lZ-R_{—w)dw. 







(172) 



(173) 



The argument of the logarithm in ( 166 1 can be interpreted as an expectation with respect to vq, . . . , v„, 
with joint pdf 50(^0) 11"=! 5(^a)- '^^e law of large numbers, this measure satisfies a concentration 



property with respect to the empirical correlations ( 167 1. Hence, we can invoke Cramer's large deviation 
theorem as follows. Since Q is a function of vq, . . . , v„, the conditional pdf of Q given vq, . . . , 



is just a multi-dimensional delta function (i.e., a product of delta functions), hence, we can write 
j dvo---dv^ (^<?o(vo)n5(va)^ xexp(-ng(")(Q) 

J exp (-ng(«)(Q)) ^(r)(dQ|vo, . . . , v„) 



J exp (-n (g(")(Q) + /(")(Q))) dQ 



(174) 

(175) 
(176) 
(177) 



where (177i holds in the sense that, when we consider the quantity (176 1 inside the logarithm in the 



limit (166 1, it can be replaced by (177i. 

The rate function /("^(Q) of the measure iJ,^\dCl) defined as 



// u \ u / n \ 

dvo • • • dv, 50 (Vo) n 5(Va) n M E " (^^^^ 

V a=l / a<a' \k=l J 



E 



a<a' 



dQ, 



(179) 



is given by the Legendre-Fenchel transform of the log-Moment Generating Function (log-MGF) of the 
random vector V = {Vq, Vi, . . . , Vu)'^, where Vq = XqBq and Va = XaBa, Xq, Xi, . . . , Xu are iid 
Gaussian RVs ~ px{x) = eyi\>{—\x\'^/Vx), and Bq, . . . , B„ are independent variables with Bq ~ 
PBq and Ba ^ qs- The MGF of V is given by 



exp 



(vtQv)] 



and the rate function is given by 

/H(Q) =sup{tr(QQ) -logM(")(Q)} 



(180) 



(181) 



Q 
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Eventually, using this into (177) and the resulting expression in the limit (166 1 and applying Varadhan's 
lemma, we arrive at the saddle-point condition 



1 



lim - log ( / ^J.'^HdQ) exp ( -n6;^"^(Q) 

n— >oo n 



inf I (Q) + sup {tr(QQ) - log M^") (Q) } | 

(182) 

inf sup (Q) + tr(QQ) - log M^") (Q) } 

(183) 



Q 



Now we focus on the calculation of the MGF. Under the RS assumption, the supremum in ( 1 82 1 is 
achieved for Q in the form 

c dl^ 

d*l + 

where c, d, g, f are parameters. Using the RS form for Q we obtain 



Q 



(184) 



exp 



2 

4^o + v7EK +(c-^-^)m' + ig-f)±\V^\ 

^ a=l ^ ^ a=l 



(185) 



We use the complex circularly-symmetric version of the scalar Hubbard-Stratonovich transform r34||. 



jl^l' = !l J exp (-r^lzp + 2^Re{x*z}) dz 



for x,z ^ C and r] € M+. Choosing rj = we obtain 

2\ 



exp 



•' a=l 



(186) 

z > \ dz 
(187) 



Using this into ( 185| ), after some straightforward algebra, we find 



M(")(Q) = E 



exp 



\dl 
f 



{d/\d\)Vo\^ + c\Vo\^ ) exp ( J] {2\d\Re{V:z} + {g - /)|K|') j dz 

(188) 



\a=l 



Notice that Vq has a circularly symmetric distribution, therefore ((i/|(i|)Vo and Vq are identically dis- 
tributed. Hence, we can write 

'\d\' 



m(")(Q) = E 



\d\\ 
exp ( -— \z 



Vo\^ + c|Vop) exp {2\d\Re{V:z} + (g - j dz 



(189) 
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Since ( 1 89 1 depends only on \d\, without loss of generality we re-define the parameter d to be in 



Also, notice from ( fl89| ) that lim„^o ^-^^^"''(Q) = 1- 

Following the replica derivation steps outlined at the beginning of this section, we have to determine 



the saddle-point Q*{u) and Q*(u) achieving the extremal condition in (182), for general u, and finally 
replace the result in ( 164| ), differentiate with respect to u and let n — )■ 0. Since the function in (182]) is 
differentiable and admits a minimum and a maximum, following the result of Appendix |G] we have that 
determining the saddle-point (Q*(u), Q*(u)), replacing it in (164i, differentiating the resulting expression 



with respect to u and letting u — yields the same result of replacing in ( 1 82 1 the saddle-point for 
u = 0, denoted by Q*(0) = Q* and Q*(0) = Q*, differentiating the result with respect to u and letting 
ti — )• 0, where now Q*, Q* are constants independent of u. 



Differentiating ( |182[ ) with respect to Q, we obtain the equation 

Q 



E 


VVt exp (vtQV^ 




E 


exp (vtQv) 





(190) 



Since we evaluate the saddle-point conditions at u — )• 0, and since the denominator in ( 190 1 is M(")(Q), 
which is equal to 1 at u J, 0, we can just disregard the denominator and focus on the numerator in 



the following. Using the expression (172i for G^'^\Q), with eigenvalues Ai and A2 given by (169 1, and 
noticing that the RS conditions ( |168| ) and ( |184[ ) yield 



tr(QQ) = eoc + ueig + 2Re{'d}du + u{u - l)u)f 



(191) 



we have that the whole exponent depends only on the real part of '0. Therefore, we re-define i? to be a 
real parameter and differentiate with respect to -d, uj, ei and eo, and impose that the partial derivatives 
are equal to zero. We find the conditions 

1 



d 
f 

9-f 



7 ^ + u 



1 



^r(-Ai 



- 77^r(-A2) 



1 



7 ^ + u 



7 1 + u 



7^R(-Al] 



(192) 

(193) 
(194) 
(195) 
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Evaluating these conditions for u | and noticing that, as u vanishes, Ai — )• A2, we find: 



d* = 7^r(-A^) 
r = lim i f77^R(-A^ 



1 



7 + n 



M-s>0 tt 

_d_ 
du 

7'^r(-A^) + 7'7tR(-A$) {el - 2r + co* - 7(et - oj*)) 
-d* 



1 

+ u 



u=0 



(196) 
(197) 

(198) 
(199) 
(200) 
(201) 



where A2 = 7(e^ — a;*) and where TZii{-) denotes the first derivative of 7^r(-). 



The conditions for eg, e^, in terms of d*,g* and /* are obtained from (190 1, recalling that, by 

definition, eo = Qoo,^! = Qii,^ = Qoi and uj = Q12. In order to obtain more useful expressions for 
these parameters, we use ( |201| ) and ( |200[ ) in ( |189| ) and write 

'{d*f f ( {d^f 



E 



E 



id*? 



exp 



/ 



Vo\A exp ( ^ (2d*Re{K*^} - 

^ Va=l 



y exp (-7/ |z - Vbl^) exp ^-^ l-^ " e«l^l'"dz 



(202) 
(203) 

(204) 
(205) 



where we define i] = {d*) /f* and ^ = d*. 



Focusing on the numerator in (190 1 and following steps similar to the derivation of (202 1, we obtain 
the following expressions for the correlation coefficients eo,i9,ei,a;: 
1) For eo = Qoo we have 



E 



Fop exp {-r]\z - FoP) • exp ^ k - e«l"l'"dz 



a=l 



(206) 



(207) 



2) For ■!? = Qoi, we introduce the RV ^ ~ g(-) (same distribution as any of the VaS) and independent 
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of Vo, Vi- Then, we can write 

r = 



(208) 



E 



E 



E 



a=2 



ulO 

(209) 



^Foexp(-r/|z-yon •^rexp(-e|z-yi|2) -E [exp(-e|z-y|2)]" e«l^l'"dz 

y* exp(-^|z-y|2) 



(210) 



Foexp -r/lz-For 



E[exp(-e|z-F|2)] 



dz 



3) For ei = Qn we have: 



E 



E 



E 



vr 



exp (-r?|z - Fol') • exp (-^1^ - Fil') • exp J] k - KM e«l^l'"dz 



a=2 



(211) 
(212) 

(213) 



exp(-r/|z-VbP) •|Fi|2exp(-e|z-yi|2) •E[exp(-e|z-F|2)]" ^ e^l^l'^dz 

|y|2exp(-e(z-F)2 



exp - Fol ) • E 



E[exp(-^|z-y|2)] 



dz 



(214) 
(215) 



4) For a; = (5i2 we have: 



UJ 



E 



E 



exp {-r]\z - Vo\^) V^V^ exp {-i\z - Fij^) exp (-^1^ - l/sP) • 

•exp e«l^l'"dz 

exp (-r/|z - Fol') V1V2 exp (-^k " ^il') exp {-i\z - l/sP) • 

.E[exp(-e|z-y|2)]"-' e^l^l'^dz 



U4.0 



140 



E 



vr 



exp (-r/lz - Vbl^) 



E 



Vexp(-^|z-y|2) 
E[exp(-Ck-y|2)] 



(216) 
(217) 

(218) 

(219) 
(220) 

(221) 



Finally, we define a single-letter joint probability distribution and restate the expectations appearing in 



(208 1, (212), (216l in terms of this new single-letter model. Let puiC^o) denote the Bernoulli-Gaussian 
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density of Vq, induced hy pxi-) and hy pbq{-), and let 

PY|yo;r?(ybo; V) = ^ exp (-r?|7/ - uop) (222) 
denote the transition probability density of the complex (scalar) circularly symmetric AWGN channel 

Y = Vo + r]--2Z (223) 
with Z ~ CJ\f{0, 1). Also, define the conditional complex circularly symmetric Gaussian pdf 



QY\V;dy\^'^ 6 = - exp (-C|y - wp) 
vr 

and, using Bayes rule, consider the a-posteriori probability distribution 

QY\V;dy\'"'^^)9iv) 



Qv\Y;^{v\y;£.) 



J QY\vAy\'"'^09{v)dv 
exp {-^\y - up) g{v) 



E[exp(-e|y-F|2)] ' 

The joint single-letter probability distribution of interest for the variables Vb, Y and V is given by 

PVo{vo)PY\Vo;viy\'"0; ri)QV\Y;i{v\y; 0- 



(224) 

(225) 
(226) 

(227) 



This explains the decoupled channel single-letter probability model (94 1. 
Now, we can define the conditional mean of V given Y as 

E[V\Y = y] = f vqy\Y;^{v\y;Odv 

>exp(-g|7/-yp) " 
E[exp{-^\y-V\^)]_ 

The corresponding conditional second moment is given by 



E 



E 



|y|2exp(-e|y-y|2) 



(228) 
(229) 

(230) 
(231) 



E[exp(-C|y-l^|2)] 

At this point, it is easy to identify the terms and write the expressions ( |208 1, (212), (216l in terms of 



expectations with respect to the single-letter joint probability measure defined in (227 1. We have 

eo = n\Vo\^] (232) 

= E[yo(E[y|y])*] (233) 

= E[smv(y;0] (234) 



UJ 



E 



\E[V\Y]\' 



(235) 



45 



In order to obtain the desired fixed-points equations for the saddle -point that defines the result in (182 1, 
we notice that 



eo - 2^" + oj" = E \Vo-E[V\Y]f 



and that 



E 



\V -E[V\Y]\'' 



mmse(y|y). 



(236) 

(237) 
(238) 



Using (192 1, (193 1, the equality A2 = 7(e* - co*), and recalling that ^ = d* and t] = (d*)^//*, we arrive 
at the system of fixed-point equations (93ai - (93d]). In the matched case, where qB{ ) = Pb{-) and 7 = 1 



we immediately obtain that S = x and therefore £, = rj, and the fixed-point equations reduce to (17ai 
( [TTbl ) in Claim [T] 



Using the values solution of (93ai - (93dl into (182i, using the trace expression (191 1 and finally 



putting everything together into (164i and taking the derivative w.r.t. u evaluated at u 4, 0, we eventually 



obtain the free energy £ in ( 155 1 as given by 



log(vr/7) + 7 + 



+ — {e^c* + ue\g^ + 2rd*n + u{u - 
d 



d_ 
du 



Tl-R,{—w)dw + [u — 1) / Tl^{—w)dw 



du 



logM(")(Q* 



Examining each term separately, we have: 



d_ 
du 



TlTi{—w)dw + [u — 1) / Tlii{—w)dw 



{e*Q - 2^* + w*)(7-^ +u)- {e\ - uo* + u{e*Q - 2^* + u*)) 



(7-1+^)2 



7^R(-A2)(eS-2^?" + a;'^-7(e^-^*))+ / n^{-w)dw 



iT^r{-x){s - x) + nji{-w)dw 



(239) 
(240) 
(241) 

(242) 



+ / TZii{—w)dw 

(243) 
(244) 
(245) 



where we have used the definition of x and 5 in (93ai and (93b 1, respectively, and the relations (236 1 



and (237 1. For the trace term, recalling that c* = 0, we have 

— je^C^ + uelg* + 2^*d*u + u{u - 1)0;^*} 



(246) 
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Finally, for the log-MGF term we use ( 202 1 and performing the expectation with respect to Vi, . . . ,Vu 
(independent and identically distributed as V) first, we obtain 



E 



^ ' exp (-T]\z-Vof] (E 



Hence, 



d_ 
du 



logM(")(Q* 



-E 
-^E 



^ ' exp [— |z — Vol^) log (e exp [— |z — 



— j |zp exp (^—ri \z — Vol^^ dz 



-E [log (gy;g(y))] + log ^ - ^ - my^?' 



vr 7] 



where in the last line we use (224 1 and define 



(lY;i{v) = / (lY\v-i[v\v)9{v)dv 



E [exp {-C\y - V\^)] 



(247) 

dz 348) 
(249) 
(250) 

(251) 
(252) 



It is understood that if (93a) - (93d I have multiple solutions, then the solution that minimizes the free 
energy should be chosen. 

We conclude by showing that (239]) can be written in the form (97). Putting together (246) and the 



last term of ( |248| and recalUng that E[|Vb|^] = eg and that ^ = we have: 



(253) 



Adding and subtracting u)d* and using (236) and the definition of 5 (see (93b)) we have 



-e^d* + 2d*d'' -vo*d* + oj''d* -uj*f* + e\g'' = {-e*Q + 2^ - uj'')d*' + uj*d* - f* + e^g* 

(254) 

= -Sd* -u}*{f* -d'') + elg*. (255) 

Recalling that g* = f* — d*, we obtain 

- 6d* + {el - uj*)g* (256) 
Recalling that — oo* = A2/7 = x/7> we get 

-6d' + g*xh (257) 
Finally, using d* = ^ and ry = {d*)'^/f* we arrive at 

-6^ + d\r/d' - l)x/l = -5C + d\{rd*)/{d*f - l)xh = + m/v - l)xh (258) 
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Next, we use (258l, the remaining terms of ( 248[ ), (242) and the first terms in (239l, together with the 
saddle-point equations (93a]) - ( |93d i, to eventually obtain £ in the form (97). 



For the case qsi') = Pb{') and 7 = 1, noticing that 5 = x and ^ = i], with x and r/ given by ( 17ai 



(17b I, the free energy takes on the form 

£ = I (Vo; Fo + + (7^R(-^l;) -v)dw + log(e^), (259) 

where Z CM{0, 1) and where we used the fact that when ^ = r] and V ~ Vq, then qy-^iv) = PY^-qiu) = 
[exp(-r?(y-yo)')], so that 

- E [log(7y;g(y)] -logj = h (Vo + ?7-5z) - h (77-5^) = / (^o; Vo + r/-5z) . (260) 

Using (259 1 in the mutual information expression (154i we obtain (16 1 in Claim [T] 



Appendix C 
Proof of Theorem[2] 

We start by recalling some transforms in random matrix theory and some related results from |[38l . 

Definition 1 The rj-tmnsfonn of a nonnegative random variable X is 

1 



r?x(s) =E 



l + sX 



with s > 0. 



Note that 



¥{X = 0) < 7]x{s) < 1 
with the lower bound asymptotically tight as s — ?■ 00. 

Definition 2 The Shannon transform of a nonnegative random variable X is defined as 

Vx(s) =E[log(l + sX)] 

with s > 0. 



(261) 



(262) 



(263) 





Assuming that the logarithm in ( 263 1 is natural, the r/ and Shannon transforms are related through 



d 



Vx{s) 



1 - Vxis) 



(264) 



ds ' ' s 

Also, it is useful to recall here the definition of the S-transform of free probability (see ||38l and references 
therein), which is used in some of the proofs that follow. 
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Definition 3 The S-transform of a nonnegative random variable X is defined as 

^x(z) = + (265) 

where Vx^i') denotes the inverse fimction of the rf-transform. (> 

It is common to denote the ?7-transform, the Shannon transform and the S-transform of the spectral 
distribution of a sequence of nonnegative-definite n x n random matrices B, for n — )• oo, by r7B(')' ^b(') 
and Sb(-)' respectively. In this case, the lower bound in (262]) corresponds to the limiting fraction of 
zero eigenvalues of B. 

Theorem 8 Let A and B be nonnegative asymptotically free random matrices, then for < s < 1, 

^A^is) = Y^^Vl\s)v^\s) (266) 

■ 

In addition, the following implicit relation is also useful: 



??ab(s) = r/A 1-. TV (267) 

The next two results are instrumental to the proof of Theorem [2] While they might have appeared 
elsewhere, a simple and self-contained proof is given here for the sake of completeness. 

Theorem 9 Let A and B be nonnegative asymptotically free random matrices. For s > 0, let {rj,a,i') 
be the solution of the system of equations: 

V = ??A (a s) (268) 

V = Vb (i^s) (269) 

V = T-^ (270) 

Then, the r]-transform of AB is given by 

VABis) = r]. (271) 



Proof: Letting r]AB{s) = r]{s) for simplicity of notation and using ( |267| ), we have: 

•nis) = ilA{a s) (272) 

where 
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which is equivalent, using Definition [3} to: 

1 / 1 



Letting 



a \r7(s) 

1 ( 1 



r/(s) 



(274) 



as \T]{s) 

from (272 1 and ( 274 ), Theorem |9] follows immediately. ■ 
As a consequence of Theorem [9j we have: 

Theorem 10 Let A and B Z?^ nonnegative asymptotically free random matrices. The Shannon-transform 
of AB is given by 



Vab(s) = Va (a s) + Vb {ys) - log(l + a vs) 



(275) 



where a and v are the solutions of the system of equations (268) - {270), which depend on s. 



Proof: The proof follows an idea originated in ||331 to write the Shannon transform when the rj- 
transform is given as the solution of a fixed-point equation: for any differentiable function /, the definition 
of the Shannon transform of an arbitrary nonnegative random variable X leads to 



d 

Ax 



Vx{sf{s))=¥. 



[sf{s) + f{s))X 
l + sf{s)X 



(276) 



where the "dot" here denotes differentiation with respect to the variable s. Since both sides of (275 1 are 



equal to zero at s = 0, it is sufficient to show that the derivatives with respect to s of both sides of (275 1 
coincide. Letting A and B denote random variables distributed according to the spectral distribution of 



A and B, respectively, differentiating w.r.t. s the difference of the right side minus the left side of (275 1 
yields 



E 



(as + a)k 



1 + asA 

ds + Q 



+ E 



(z>s + z^)B 
1 + usE 



au + aOs + aiys 1 — ??ab('5) 
\ + avs s 



(277) 



a s 



(l-??A(as))H {l-r]^{vs)) , , : (278) 



as + a vs + v 

{l-r]) + (1 -??) 



a s 



us 



1 + aus 



av + avs + avs 1 — rj 
1 + aus s 



{av + avs + avs) 







1 — rj 



a vs 



(279) 
(280) 

(281) 
(282) 
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where used (264i to write the left side of ( |277[ ); the right side of (277) follows from the definition of the 



r/-transform; (280 1 follows from Theorem M for {r},a,v) solutions of (268 1 - (270 1; and ( |282[ ) follows 



again from the equality in (270 1. 

Theorem [2] now follows as an application of Theorem 10 by identifying the terms. We write 



lim -/(x; y| A, U, b) 

n— >oo n 

lim -E 

n— >oo n 

lim -E 

n— >oo n 

lim -E 

n— >oo n 



logdet [1 + P^AUBB^'U+aI' 
logdet {l + ^^.U+AtAUBBt 
logdet {l + VxRSB^ 

VRBBt {Vx) 

Vn{a Vx) + VBBt (i^Vx) - log(l + a vV, 



(283) 
(284) 
(285) 
(286) 
(287) 
(288) 



where {a,v) are solutions of (268 1 - (270 1 after replacing A by R, B by BB^^ and 7 by Vx- The final 



expressions ( [T9| ) and ( pO| ) follow by noticing that the spectral distribution of B has only two mass points 
at zero and at one, with probabilities 1 — q and q, respectively. 

Appendix D 
Proof of Theorems |4] and [5] 

Notations are as in Section I-A following the observation model ([T]|. In particular, we let X = diag(x) 
and B = diag(b), and v = Xb = Bx. 
Proof of bound (|49|l: We have 

/(v;y|A,U) < E [logdet fl + gP^AUU^'A^') 



(289) 

where the inequality follows by the fact that, conditionally on A, U, the differential entropy of y = 
AUv + z for assigned covariance 

E [y y 1 1 A , U] = I + ^Pa; AUU"^ A^ (290) 

is maximized by a Gaussian complex circularly symmetric distribution y ~ CA/'(0,I + (/'P^^AUUI^AI^). 
Recalling the definition of R in (|3]l, we have 



lim -E 

n— !>oo n 



logdet (I + gP^AUU"f At 



E[log(l + gP,|R|2) 
Vn{qVx), 



(291) 
(292) 
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from the definition of Shannon transform. Hence, (49 1 follows. 

Proof of bound ( [50) l: This bound can be regarded as a "matched filter bound" on the vector channel 
with input v = Bx and output y. We can write 



/(v;y|A,U) = /(v; AUv + z| A, U) 

n 

= J]/(V-;AUv + z|A,U,y/-i) 



1=1 

n 



< ^l{Vi;AV^ + z,v:i,\A,U,Vt') 
1=1 

n 

^I{Vi■AV^r + z\A,Vr\V^+^) 
1=1 

n 

^I{Vi;AuiVi + z\A,V) 



i=l 
n 



I (Vf, ujAtAu^V- + W^\A, U 



(293) 
(294) 

(295) 

(296) 

(297) 

(298) 



2 = 1 



where (a) follows from the fact that v is iid, in (b) we define Uj to be the i-th columns of U and in (c) 
we define u|a1^z = VFj ~ CA/'(0, utAl^A^Uj), conditionally on A,U. Dividing both sides by n, letting 
defining the iid variables Zj ~ CA/'(0, 1) and taking the limit, we obtain 



Ti < lim - V / (Vi- A/ujAtAu,K + Zi A, v] 

i=l ^ ^ 



(a) 

< lim / V:: 



1 " 

- J^ujAtAu^y^ + Z, 

\ i=i 



A,U 



(299) 

(300) 
(301) 



= I [Vo; ^/E\mVo + Z 

where Z ~ Zi, where by definition ^ X]j"=i ujA^^Auj = ^tr(R) — )• E[|Rp] and where in (a) we used 
Jensen's inequality and the fact that the mutual information I{V; ^/sV + Z), for any distribution of V 
with bounded second moment, is concave in s |[T6ll . 
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Proof of bound (51 1: Let U = [ui, . . . , u„] where Uj denotes the j-th column of U. Then, we have 



I(v;y|A,U) = ^ /(F^; y| A, U, l/.^i^ 



1=1 



i=l 



j=i+l 



i=l 



j=i+l 



i=l 



A,U,F,!ti 



A,U 



(302) 
(303) 
(304) 
(305) 



where ( 303 1 follows by the chain rule and by subtracting the conditioning term, preserving the mutual 



information, (304i holds for any linear projection defined by the vector gj, function of A,U, (305 1 
follows by noticing that the arguments of the mutual information do not depend any longer on Vj^i- 

Next, we choose to be the linear MMSE (LMMSE) receiver for "user" i, of the formally equivalent 
CDMA system 

Vi 



Ti = AUiVi + A [Ui, U2, . . . , Ui_i] 



u, 



V2 



i-1 



(306) 



In particular, using E[|Vip] = qVx, we obtain 



We indicate by 



I + gP^AU,-iUj_i At 



V^^\qV,;i/n) = ujAt I + gP,AUi_iUT_, A 



-1 



Aui 



it At 



Au,; 



(307) 



(308) 



the corresponding multiuser efficiency of the LMMSE detector for "user" i. Noticing that, in the limit of 
n — )• oo, the residual noise plus interference at the output of the LMMSE detector is marginally Gaussian 
(we omit the explicit proof of this well-known fact, which holds under the assumptions of our model). 



11381 letting /3 = i/n, and denoting by ri{qVx; P) the limiting multiuser efficiency for n — t- oo, from (305 1 
we arrive at ( [ST] ) by dividing by n and taking the limit. 
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Appendix E 
Proof of the Decoupling Principle 

Notations and definitions are as in Section 



IV 



and Appendix pi We let (^okj^kj^k) denote the K-th 



components of the random vectors bo,b,b, obeying the joint n-vaiiate conditional distribution (84i for 



given A, U, with b = b(y, A, U) given by (82 1. We are interested in showing that the asymptotic joint 
marginal distribution of (6ok, b^., b^), for some generic index k, converges to the joint distribution of the 



triple {Bo,B, B) given by (86 1 in Section IV independent of k. 



To this purpose, we follow in the footsteps of 11181 and consider the calculation of the joint moments 
E[6q^6k] for arbitrary integers i,j>0. Since the moments are uniformly bounded, the K-th joint marginal 
distribution is thus uniquely determined due to Carleman's Theorem |[T3l p. 227]. The desired result will 
follow upon showing that the moments converge to limits independent of k. Furthermore, as we will see. 



the form of the asymptotic moments yields explicitly the joint distribution of (Bq, B, B) given in (86 1. 

In order to proceed, we define the replicated model given by the distribution of bo, y, bi, . . . , b^j, for 
given A, U, as: 



Pbo(bo)Py|bo,A,u(y|bo, A,U) Jl gb|y,A,u(ba|y, A,U). 



(309) 



a=l 



All expectations in the following derivations are with respect to the joint measure (309). For a function 
/(bo, bi, . . . , b„), we define 

u 

z(")(y,A,U,bo;/i) = J] e'^^(^«'^--'^") 9b(ba)'7y|b,A,u(y|ba, A, U). (310) 

bi,...,bu a=l 

By HH Lemma 1], if E[/(bo, bi, . . . , bu)|y, A, U, bo] is 0(n) and does not depend on u, then 



lim -E[/(bo,bi,...,b„)] = lim lim ^- logE[z(")(y, A, U, bo; /i)] 

rn-oo n rt-s>oo -u-s-O On n 



(311) 



h=0 



In our case, we let /(bo, bi, . . . , b^) = X]fc=i bokKnk given i, j > 0, and some replica index m G 
{1, . . . , u}. By the symmetry with respect to the replica index and the indices of the vector components, 
for any k we can write 



lim -E 

ra— s>oo n 



.k=l 



= lim -E[/(bo,bi,...,b„)]. 

n— >oo n 

Using the procedure outlined before, we need to calculate 

lim lim^-logE[z(")(y,A,U,bo;/i)] 

n->oo oh n 



(312) 
(313) 

(314) 
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As usual in replica derivations, we switch limits and calculate first 

lim -logE[Z(")(y,A,U,bo;/i)]. 



n— s>oo n 



(315) 



In passing, we notice that Z(")(y, A, U, bo; /i = 0) = Z^"(y, A,U), so that the calculation of (314i is 
closely related to the calculation of the free energy by the replica method in Appendix |B] i.e., to the 
evaluation of the limit 



lim -logE[Z"(y,A,U)]. 

n—>-oo n 



(316) 



Operating along the same steps leading to (166 1 in the derivation of the free energy (see Appendix 
[B] ), we arrive at: 



lim -logE[z(")(y,A,U,bo;/i)] 

ra— >oo n 



= nlog7 — lilogvr — log(l + M7)+ 

^ j r u u 

+ lim -log dXo ■ ■ ■ dXuPhoO^o) Y\ QhiK) Y\ P^O^a) 

n— 5>oo n \ ^ — ' J 



,bo,...,b„ 



a=l 



a=Q 



■ exp /i ^ ^ofc^mfc ■ exp -ra ^ / nn{-w)dw 



k=l 



i=l 



A.(L) 



(317) 
(318) 

(319) 
(320) 



We notice that the second exponential term in (|320[) is identical to what appears in the computation 



of (316 1 and, following the steps in Appendix \B\ yields an exponential term exp(— n^*^")(Q)) given in 
(172]), function of the empirical correlations of the vectors = X^ba as defined in ( |167| ), and collected 
in the empirical correlation matrix Q whose form, under the RS assumption, is given in (168 1. 
Invoking the large deviation theorem, we can write 

/u u / n \ 

dXo • • • dX„Pbo(bo) n '?b(ba) n Px(Xa) • exp h^bi^bi, ■ exp (-ng(")(Q) 

Do,...,D„ a=l a=0 V k=l J 



E 



I exp (-ng(")(Q)) fi^:\dQ; h\ho, . . . ,b„, Xq 
exp(-ng(")(Q))/i(r)(dQ;/i) 
j exp (-n (g(")(Q) + l(")(Q; /i))) dQ 



I x„) 



(321) 
(322) 

(323) 
(324) 



where the approximation step holds in the sense that when n gets large we can replace the argument of 



the logarithm in (319 1 with the quantity in (324i. 
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Using Cramer's theorem, we have that the rate function /("^(Q; h) for the measure 

/u u 
%(ba) J|px(Xa) 
._ 1 1 



bo,...,b„ 



a=l 



a=l 



k=l 



E 



a<a' \k=l 
\ u / n 



exp U ^ ^Ofc^ifc n I ^kahkaXla'hla' " f^Qa,a' 1 
V k=l / a<a' \k=l / 

is given by the Legendre-Fenchel transform 

l(")(Q;/i) = sup|tr(QQ) -logM(")(Q;/i)} 



Q 



where the relevant MGF for the measure pTJ\ is 

M(")(Q;/i) =E 



exp ( hBl^Bi, + b^X^QXb 



(325) 
(326) 
(327) 

(328) 
(329) 



where we define b = {Bq, Bi, . . . , and X = {Xq, Xi, . . . , X^)^, with Bq ~ psg (Bemoulli-g), 

Ba ~ qs (marginal of the assumed prior distribution %(•)) for 1 < a < and Xa ~ Px{x) = 
1 e-l^'l'/^- for all < a < n. 

TtVj; — — 



Plugging (329 1 into (324i and the resulting expression in the limit of ^ log(-) appearing in (3 19 1 and 
applying Varadhan's lemma, we arrive at the saddle-point condition 



lim -log( / ^(r^(dQ)exp(-ngW(Q) 

n— s>oo n 



infsup{g(")(Q) +tr(QQ) -logM(")(Q;/i)} 

(330) 



Q 



Following the replica derivation steps outlined at the beginning of this Appendix, we have to determine 



the saddle-point Q*{h) and Q*(/i) achieving the extremal condition in (330 1, for general h, and finally 



replace the result in ( 315| ), differentiate with respect to h and evaluate the result for h = 0. Using 
again the result of Appendix [G) since the function in (315 1 is differentiable and admits a minimum and 



a maximum, we can replace the saddle-point of (330 1 for h = 0, denoted by Q* and Q*, and then 
differentiate the result with respect to to h and let /i — )■ 0. Noticing that for /i = the saddle-point 



condition (330 1 coincides with the saddle-point condition (182), we have that Q* and Q* coincide with 



what derived in Appendix [B] for the free energy (3 16 1. In particular, under the RS assumption, these 
parameters are given by the fixed-point equations (93a i - ( 93d[ ). Furthermore, since (330 1 and therefore 
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the whole Umit ( 315[ ) depends on h only through the log-MGF term, using (311 1 and (3 12 1 we arrive at 



lim ^logM(")(Q^/i) 
M->o oh 



h=0 



lim 

u-j>0 



E 


B'qBL exp (b^X^Q^Xb' 




E 


exp (b^X^Q^Xb' 







(331) 
(332) 



The denominator of (331 1 is identical to the MGF M*^")(Q*) defined in ( 180l and, as shown in Appendix 
telwe have that limu_j.o M(")(Q*) = 1. As for the numerator, we follow steps similar to the derivation 



of ( 202 1 and obtain 



E[6^,64J = E / exp (-r?|z - XoBo\') ■ exp (-^1^ - X^B„ 



exp|-e j e^N^^dz 



-Blexp (-7]\z - XoBol^) ■ B^exp (-C\z - X^B„ 



E [exp(-^lz-XSp)]" e' 



dz 



E 



5^exp(-r/|z-Xo5on -K 



5Jexp(-C|z-XB|2) 
E[exp(-^|z-X5|2)] 



dz 



(333) 

(334) 

(335) 
(336) 

(337) 
(338) 



where we define X ~ Xa for a = 0, . . . , n, i? ~ for a = 1, . . . , u, and where the parameters rj and 
^ are given by the fixed-point equations ( |93a i - (93d]). 

Finally, we define a single-letter joint probability distribution and restate the expectations appearing in 
( |338| ) in terms of this new single-letter model. We let 



(339) 



denote the transition probability density of the complex circularly symmetric AWGN channel with 
Gaussian circularly symmetric fading not known at the receiver. 



Y = XQBQ + ri—2Z, 
with Z ~ CA/'(0, 1) and Xq ~ px{x). Also, we define the conditional pdf 



(lY\B;i{y\^ 



n 



exp (— — px{x)dx 



(340) 



(341) 
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and, using Bayes rule, consider the a-posteriori probability distribution 

QY\b;^{y\b)qB{b) 



QB\Y;^{b\y) 



Eb' QY\b'dy\b)qBib') 

J exp (-^|y - xfop) qB{b)px{x)dx 



(342) 



E[exp(-C|y-Xi?|2)] 

The joint single-letter probability distribution of interest for the variables Bq, Y and B is given by 



(343) 



PBo,Y,B;ri,i^{bo,y,b;r],^) =pB„{bo) PY\Bo;viy\^0) qB\Y;i{b\y) ■ 



(344) 



With these definitions, it is immediate to identify the moment expression (338 1 as the joint moment of 
the single-letter probability distribution ( 344[ ), by writing: 

' B^ exp(-£,\z - XB\^ 



^5jexp(-r?|z-Xo5o|2) -E 
vr 



E 



Bf, 



E[exp{-^\z- XB\^) 
^ exp {-r]\z - xqBoI'^) Px{xo)dxo ■ 



dz 



vr 



E 



B^ f exp (— — xSp) px{x)dx 



dz 



E[exp{-^\z- XB\^)] 

J {Yl'^oPY\Bo;riiz\bo)pBo{bo)^ ' (^b^QB\Y;i{ 

XlXl / bhb^PBo{bo)PY\Bo;v(^\bo)qB\Y;^{b\z)dz 

bo b 
E[B'qB^] 



]z) dz 



(345) 
(346) 
(347) 

(348) 

(349) 
(350) 



where the expectation in (350 1 the last line is with respect to the probability distribution (344 1. 

Summarizing, we have that as far as the joint probability distribution of each component of b in 
11) and the corresponding component of the PME b (matched or mismatched), the system decouples 



asymptotically into a bank of "parallel" AWGN channels of the form (340 1, with symbol-by-symbol 
PME given by 



B = E[B\Y]=J2 bqB\Y;ii{b\Y), 



(351) 



for Bq,Y,B distributed as in (344 1, where the parameters rj and are given by ( |93a i - (93d). 
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Appendix F 
Eigenvalues of the matrix L 



The eigenvalues of L are readily computed from (163 1. Notice that the matrix (I 



7 



lit) has 



eigenvalues 



1 



(352) 



1 + uj 

• = i^u = 1, corresponding to eigenvectors 
e2, . . . ,6^ forming an orthonormal basis of the orthogonal complement of Spanjl} in C". It follows 
that 



corresponding to the (normalized) eigenvector ;^1, and 1^2 



L = ^SE diag (-^, 1, . . . , 1) Etst 



where 



E 



u 



1,62, • • ,e„ 



(353) 



(354) 



(355) 



The non-zero eigenvalues of L are the same as those of the "flipped" matrix 

"'"^ i^YT^Y 1. ■ . . , 1) Et (2 Sts) E diag (yOr, 1. . , , . ; 
Under the RS assumption, the empirical correlation matrix of the vectors si, . . . ,Su takes on the form 

-S^S ^ (a-/3)I + /31lt (356) 

n 

Using the orthonormality properties of the columns of E, we have 

Et ({a - (3)1 + /31lt) E = diag (a + (n - l)/3, a - (3, . . . ,a - 13) (357) 



Finally, we have that under the RS assumption and in the limit of large n the eigenvalues of L are given 
by 

a + {u- 1)(3 



Ai 

Aa 



7 ^ + u 
^{a — j3), for a = 2,...,u 



Using the fact that Sq = Xab„ — Xobo, we have that 

a = ei + eo - 2Re{^9}, /3 = a; + eo - 2Re{??} 



(358) 
(359) 

(360) 



Therefore, the eigenvalues (358 1 can be expressed in terms of the correlations eQ,ei,'d,uj in the form 
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Appendix G 

a property of stationary points of multivariate functions 

Let f{t,v,6) be a differentiable multivariate function with t G C^, v G and ^ € M. Let t„ with 
n = 1, . . . , N, V£ with £ = 1, . . . , L denote the n-th and i-th component of t and v respectively. We are 
interested in evaluating: 

^inf sup /(t,v, 61)19=0 



Let: 



then: 



[t*(e),v*(e)] =arginfsup/(t,v, 

t V 



-^infsup/(t,v,0) = ±f(t*(e),v*{e),e) (36i) 



N 



= J]/,„(t*,v*,^)^t:((0)) (362) 

71=1 



d 



+ Y,fvAt\^*,0)-vU{0)) + fe{t*,^\ 0) (363) 



with 



/t„(t*,v*,e) = A/(t,v,^)|t=t.,v=v, (364) 

/,„(t*(^),v*(0),e) = A/(t,v,0)|t=t.,v=v*, (365) 

fg{t*{0),^*{O),e) = _/(t,v,0)|t=t^v=v*. (366) 

Under the assumption that the supremum and the infimum are achieved by /(t,v, 0), by Fermat's 
theorem every local extremum of a differentiable function is a stationary point hence by their definition 
t*{e),v*{e) are such that for all 6 

/t„(t*,v*,0) = 0, (367) 

A„(t*,v*,0) = 0. (368) 

Hence i363i becomes: 



-^infsup/(t,v,0) = fgit*ie),v*ie),e) (369) 

Consequently 

^infsup/(t,v,^)|e=o = /e(t*(0),v*(0),0) (370) 
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from which it follows that we are allowed to compute the saddle-point (and hence the fixed-point equation) 
for ^ = 0, then replace the result in the multivariate function, and differentiate the result w.r.t. to 9 and 
then let 6 = 0. 



Appendix H 
Useful formulas 

This Appendix is devoted to provide methods and explicit formulas to evaluate the quantities appearing 
in the main results. It is worthwhile to notice that the numerical evaluation of the fixed-point equations and 
the corresponding free energy is not completely trivial from a numerical stability viewpoint, especially 
for large signal-to-noise ratio qVx and small sparsity q and sampling rate p. Therefore, some care must 
be dedicated to avoid as much as possible brute-force numerical integration. 

We start by considering the calculation of /(Vb; \/aVo + Z), for Vq = XqBo Bernoulli-Gaussian, and 



Z ~ CM{0, 1), which is instrumental in evaluating ( |T6[ ) and the bounds (50l and (51 1, for suitable choices 
of the parameter a > 0. We can write 



I{Vo;V^Vo + Z) = h{^Vo + Z)-h{Z) 



log 



l + aVx 



-\Y\y{l+aV^) 



+ (1 



-|y|= 



lege (371) 



where Y = ^/aVQ + Z. The expectation in (371 1, can be calculated by integration in polar coordinates 
and, after some algebra, takes on the form 

. -(l+aP.)r \ g-r^^ 



log 



l + aVx 



+ (1 



+ {l-q)j^ log(^- 



+ (1 



e "^dr. 



(372) 



Finally, both the above integrals can be efficiently and accurately evaluated by using Gauss-Laguerre 
quadratures. 



Similarly, the MMSE term appearing in ( 17b i can be calculated as follows. Letting Y = Vq + r] 2 Z, 
we have 

E[yo|l^] = G(|y|^ry,g,P,)-^g-y, 

where 

G{z;ri,q,Vx) = — 



T+fc^ exp(-/iz) 



exp{—ij,z) + {1 — q) exp{—r]z) 
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and where /i = r//(l + VxT^)- Notice that for g = 1 the observation model becomes jointly Gaussian, and 
we obtain the usual Gaussian MMSE estimator E[Vo|^] = x^^^^- The resulting MMSE in the general 
BemoulU-Gaussian case is given by 

2 



mmse(Vb|M) + ^ '^Z 



q^x 



E[Gi\Y\^;7],q,V.,f\Y\ 



Performing integration in polar coordinates and after some algebra we obtain 



mmse ( Vo|Vo + r/ 2Z]=q 



1 



r/(l + V^v) 



$ -{1 + V,7j) 



1-q 



2, 



where $(o, s, z) is known as the Hurwitz-Lerch zeta function ll47l . defined as 

1 



Hz,s,a) 



T{s) 



oo ^s— Ig— at 

1 — ze~ 



:dt, 



that can also be efficiently evaluated by Gauss-Laguerre quadratures. It is immediate to check that for 
q = I (jointly Gaussian case) we have 

-Px 



mmse Vo|^o + f? '^Z 



l+VxTj' 



as expected. 



In order to evaluate Xi in (16 1 it is useful to have the integral of the R-transform TZu{—w) in closed 



form. For the case of U with iid elements, using (24i we find, trivially. 



/ TlB,{-w)dw = p log(l + x) ■ 
Jo 



(373) 



'0 

For the case of Haar-distributed U, using (36 1, we find 

1 







nn{-w)dw = -(l + x-p-2plog(2(l-p))+log(l-p) 

-(1 - 2p) log(l + x-2p + p)+ log(l + x(l - 2p) + p)), 



(374) 



where p = a/(1 + xY - '^XP- 

We conclude by providing the derivation of the closed-form expression of E [|Vo — v{Y; ^)p] for the 
Lasso estimator, given in ( |144 i. We have 

¥.[\Vo-v{Y;i)\^] = gP, + EP(y;OP]-2Re{E[Fo*??(y;0]}. (375) 



62 



Recalling the expression of v{Y;^) in (142 1, we have 

1 " 



n\v{Y;0\' 



E 



\Y\ 



2e. 



J\y\>l/(20 V 2?/ 



PY{y)dy 



que-'"'' + (1 - q)rje-'T' 



2rdr 



r-oo / ^ 
If 

'1/(20 V 2^ 

POO 

2qfi / [r^ - + r/i'i(,'^)] e-^''"dr 

POO 

+2(1 - q)r] / p - rVC + r/(4e2)] e'^'^'dr. 
Ji/(20 



In order to solve the integrals in ( 376 1 we use 



2axe dx 



-ab" 



2^ 



2ax^e "-^ dx 



{l + ab^)e 



By applying the above integrals in (376 1 and after some manipulation, we obtain 

(1-a 



E[\v{Y;0\' 



'7r/i'erfc(A/ fi' 



V 



7r?7'erfc( A/r/') 



(376) 

(377) 
(378) 
(379) 

(380) 



with /i' = /i/(4^2) an^j ^/ ^ r//(4^2)_ 

Next, we calculate the expectation E[yQ*?;(y; ^)] as follows: 

Woviy-^o] = ^ ^0 ri-^ ^^^^ 

= qE X* \Y\-^ ^ So = 1 . (381) 
We notice that {Xq , Y) given i?o = 1 are jointly Gaussian, with mean zero and covariance matrix 

Vx v^ + i/v 



Cov(Xo,y) = 

Then, Xq given Y is Gaussian with mean 

E[Xq\Y] 

and variance 



-Y 



Var(Xo|y) = V 



Vx + l/r]' 
■p2 
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Using iterated expectation, we can calculate the expectation in (381 1 as 



E 













^0 






Bo = 1 


= E 









E[Xo\Y,Bo = l] 











1] 






Bo = l 









E 















\Y\ 


Bo = \ 






+ 



V. + 1/r? 7|j;|>l/(20 



2C 



2/i r 



1/(25) 



2^ 



Using again the integrals (377i - (379 1, we obtain 

f oo 



2/i r 



1/(2?) 



1 

2e 



\y\PY\Bo=iiy)dy 

r^e-^'^dr. (382) 



(383) 



Finally, replacing all terms in (375 1, after some simplifications, we obtain (144). 
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